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Preface 


The  purpose  of  this  thesis  was  to  develop  experimental  descriptive  regression  models  for  esti¬ 
mating  the  job  performance,  or  productive  capacity  (PC),  of  Air  Force  Aerospace  Ground  Equip¬ 
ment  (AGE)  mechanics.  The  data  that  I  used  were  collected  under  the  Air  Force’s  Productive 
Capacity  Project  by  myself  and  other  personnel  from  the  Manpower  and  Personnel  Research  Divi¬ 
sion,  Human  Resources  Directorate,  Armstrong  Laboratory  (AL/HRM),  and  contractor  personnel 
from  the  Human  Resources  Research  Organization  (HumRRO),  Alexandria,  VA,  and  the  Systems 
Research  and  Applications  (SRA)  Corporation,  San  Antonio,  TX.  I  was  fortunate  enough  to  work 
with  these  tent  people  in  the  project  planning,  data  collection  and  preliminary  analyses  of 

the  collecter  data. 

The  Productive  Capacity  Project  is  part  of  an  ongoing  research  and  development  effort  aimed 
at  identifying  methods  for  best  using  Air  Force  personnel.  The  Air  Force  recognizes  that  in  this 
day  of  force  downsizing  and  shrinking  defense  budgets,  it  must  make  optimal  use  of  its  personnel 
resources.  Making  best  use  of  personnel  resources  implies  the  need  to  be  able  to  validly  and  reliably 
measure  and  quantify  airmen  job  performance.  It  also  implies  the  need  to  be  able  to  model,  or 
predict,  the  job  performance  of  Air  Force  applicants  and  incumbent  personnel.  Mathematical 
modeling  of  job  performance  can  con  tribute  substantially  to  the  Air  Force’s  ability  to  better  plan 
and  use  its  manpower  resources.  This  thesis  research  fits  into  the  bigger  picture  of  optimal  use  of 
resources  by  providing  analyses  of  the  effects  of  two  important  predictors,  aptitude  and  expei.ence, 
on  airmen  job  performance.  | 

This  thesis  would  not  have  been  possible  without  the  continuous  help  and  guidanc"  from 
the  personnel  of  AL/HRM.  1  am  most  indebted  to  Ms.  Jacobina  Skinner  for  her  enormous  help 
in  gathering  background  material  and  in  providing  non-stop  consultatio is  throughout.  I  am  also 
grateful  to  Mr.  Bill  Glasscock  for  his  help  in  creating  the  impeccable  data  files  that  were  provided 


II 


to  me.  And,  this  thesis  would  literally  not  have  been  possible  if  Lieutenant  Colonel  Roger  Alford 
had  not  granted  me  permission  to  use  the  Project  data.  My  thanks  goes  to  all  of  them. 

Next,  I  wish  to  express  my  appreciation  to  Professor  Daniel  Reynolds  for  serving  eis  the  thesis 
advisor.  His  continual  guidance  saved  me  from  going  too  far  astray  on  many  occasions.  And,  I  say- 
thanks  to  Lieutenant  Colonel  Kenneth  Bauer  for  his  help  and  patience  while  serving  as  a  reader 
and  department  representative  for  this  effort. 

I  would  be  remiss  if  I  did  not  thank  my  many  claissmates  who  helped  me  through  this  effort. 
In  particular,  I  extend  my  appreciation  to  Captains  Tim  Mott,  Randy  McCanne  and  Tom  Sterle 
for  their  unending  moral  support. 

Last,  and  most  of  all,  I  wish  to  thank  my  loving  and  supportive  wife,  Mary  Jean,  and  my 
children  for  supporting  me  and  tolerating  my  absence  (even  when  1  was  there). 


Robert  S.  Faneuff 
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Abstract 

This  study  investigated  the  effects  of  mechanical  aptitude  and  job  experienc;;  cii  the  job 
performance  of  204  Air  Force  Aerospace  Group  Equipment  (AGE)  mechanics.  Job  performance 
was  expressed  as  productive  capacity  (PC),  which  is  derived  from  estimated  performance  times  on 
job  tasks.  PC  measures  were  derived  for  50  tasks  typically  performed  by  airmen  in  the  specialty. 
Aptitude  measures  took  the  form  of  Mechanical  pe'centile  composite  scores  on  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB).  A  second-order  logistic  model  was  used  to  regress  PC  on 
aptitude  and  experience  at  the  task  level  and  at  the  overall  job,  or  aggregate,  level.  Model  R-s  were 
generally  low.  For  the  tasks,  R^s  ranged  from  .01  to  .13,  and  for  the  aggregate  model  the  R-  was 
about  .16.  Generally,  experience  was  a  significant  predictor  but  aptitude  was  not.  There  was  also 
no  indication  of  an  aptitude/experience  interaction.  These  results  were  verified  through  forward 
stepwise  regression.  There  was  some  evidence  that  airmen  may  experience  some  skill  degradation 
on  production-type  tasks  at  around  the  six  year  point  as  they  transition  to  supervisory  roles. 
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PREDICTING  THE  PliODCCTlX  E  CAPACITY  OP  AIR  FORCE 
AEROSPACE  (IHOl'ND  EQCIPMENT  PERSONNEL  USING 
APTITUDE  AND  EXPERIENCE-  MEASURES 

/.  Introiliidion 

t.l  General  Istiue 

Over  file  last  several  years,  the  Air  Forri-  has  comhicted  numerous  research  activities  aimed  at 
developing  sound  ways  of  measuring  the  joh  performance  of  its  personnel.  These  research  activities 
were  the  result  of  three  primary  requirements  (U>:1)  (3():i).  First,  program  managers  in  the  .\ir 
Force's  manpower,  personnel  and  training  communities  expressed  concern  that  joh  iierformance 
measures  were  needed  for  the  evaluation  of  their  training  and  selection  programs.  Second,  managers 
of  Air  i'orce  research  and  development  (H.UD)  programs  needed  job  performance  measures  to  .serve 
as  objective  criteria  for  assessing  tlie  impart  of  various  factors  on  individual  and  unit  effectiveness. 
The  third  and  most  pressing  requirement  was  a  directive  issued  to  the  armed  services  in  1980  by 
the  Assistant  Secretary  of  Defense  (Manpower,  Reserve  Affairs  and  Logistics).  The  directive  tasked 
the  services  to  link  their  enlistment  aptitude  standards  to  job  performance.  This  of  course  required 
the  services  to  develop  valid  job  performance  measurement  systems.  Adding  to  the  force  of  the 
directive,  the  House  Committee  on  Appropriations  tasked  the  Office  of  the  Secretary  of  Defense  in 
1983  to  provide  direct  oversight  for  joint-s«*rvice  research  activities  to  address  tin'  measurement  of 
military  job  performance  and  the  linkage  of  job  performance  to  enlistment  standards. 

These  initial  requirements  provided  the  impetus  for  the  planning  and  execution  of  several  ma¬ 
jor  RAcD  efforts  by  the  services  thronghoiil  the  lOHOs.  Thc-si'  research  efforts  were  accomplished  pri¬ 
marily  under  a  joint-service  program  called  the  Job  I’erformancc  Measurement  (.lPM)/F-nlistnient 
Standards  Project. 
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By  1990.  the  .Mr  Force  had  developed  a  <letailed  ji<h  perfortv.ance  measurement  system  and 
had  essentially  fulfilled  all  the  initial  requirements.  The  Air  Force,  however,  did  not  elect  to 
abandon  its  research  on  job  iicrformance.  Instead,  it  began  the  I’roductive  Capacity  Project  in 
1990  to  continue  its  research  on  the  development  and  potential  uses  of  job  performance  mea-sur'S. 
The  Air  Force  felt  that  much  more  could  be  gained  tbrough  job  performance  RA:D.  It  recognized 
that  job  performance  research  could  be  of  great  potential  value  in  force  acquisition  and  manpower 
modeling  and  planning.  For  instance,  it  saw  that  if  job  performance  could  be  modeled  or  predicted 
for  those  desiring  to  enter  the  service,  tho.se  who  would  likely  perforin  well  could  be  identified  for 
selection.  Also,  the  Air  Force  saw  that  if  it  could  model  or  predict  the  performance  of  its  incumbent 
personnel,  airmen  could  potentially  be  allocated  or  .assigned  to  jobs  so  that  manpower  resources 
are  best  used 

The  need  for  sound  manpower  modeling  and  plattning  has  been  highlighted  by  several  recent 
events  which  include  a  virtual  end  to  the  Cold  War,  Operations  Desert  Shield  and  Desert  Storm, 
sending  of  troops  to  United  Nations  (UN)  sponsored  activities,  defense  budget  cuts,  and  force 
downsizing  (30;i).  There  seems  to  be  a  trend  of  increasing  world  instability  and  a  decreasing 
military  to  deal  with  it.  What  the  future  likely  holds  for  the  military  is  increasing  demands  placed 
on  a  smaller  force.  There  is  no  doubt  then  that  manpower  resources  must  be  planned  and  used 
wisely.  This  means  the  Air  Force  must  be  able  to  validly  measure  job  performance  and,  more 
importantly,  be  able  to  predict  it  for  its  personnel. 

Since  the  Air  Force  does  not  have  a  crystal  ball  to  help  it  to  predict  the  performance  of  if  its 
applicants  and  incumbents,  it  has  typically  relied  on  job  performance  models  to  do  the  prediction. 
The  most  frequently  used  models  have  been  regression -based  mathematical  models. 

Unfortunately,  the  development  of  such  models  can  be  fiu.strating.  Development  of  job  per¬ 
formance  models  involves  numerous  elusive  problems  that  have  [dagued  Industrial/Organizational 
r’sychologist.s  and  other  analysts  for  years.  For  instance,  developers  of  job  performanct'  models  typ- 


•2 


ically  must  define  job  performance,  figure  out  how  to  accurately  measure  it,  decide  which  factors 
influence  it.  and  figure  out  how  the  influencing  factors  (called  predictors)  mathematically  relate  to 
performance  measures — none  of  these  have  proven  to  be  a  trivial  undertaking.  Despite  the  diffi¬ 
culties  in  developing  mathematical  prediction  models,  the  need  for  them  exists,  and  the  Air  Force 
continues  to  try  to  develop  them. 

To  be  successful  in  developing  mathematical  models  for  predicting  job  pt  iPiiriTianc: ,  the  Aii 
Force  must  continue  to  accomplish  the  following  iteins: 

•  Define  job  performance. 

•  Develop  valid  and  reliable  measures  of  job  performance  ris  defined. 

•  Apply  the  job  performance  measures  to  a  representative  sample  of  incumbent  airmen. 

•  Identify  factors  likely  affecting  job  performance  (predictors). 

•  Look  for  mathematical  relationships  between  predictor  variables  and  the  job  performance 
measures  of  the  airmen  sample,  and  identify  significant  relationships. 

•  Specify  an  appropriate  mathematical  model  that  relates  the  significant  predictor  variables  to 
job  performance  measures. 

•  Validate  the  mathematical  model,  perhaps  on  another  independent  sample  of  airmen. 

It  is  important  to  point  out  that  the  above  items  represent  a  continual,  iterative  process  and  not 
a  one-time-through  list.  The  process  can  be  viewed  as  having  three  distinct  components  or  pha-^es 
which  are  illustrated  in  Figure  1. 

The  process  components  are  job  performance  measurement,  job  performance  modeling  and 
model  validation.  The  process  components  and  their  subitems  frequently  require  revisiting  as 
more  job  performance  measurement  knowledge  is  gained.  Each  job  performance  re.searrh  effort 
seems  to  contribute  a  little  more  to  the  job  performance  knowledge  ba-se  while  at  the  same  t  ime 
creating  as  many  new  research  questions  a-;  it  aiuswered.  Progress  toward  development  of  sound 
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S3  Define  Job  Performam*** 


Develop  Valid  and  Reliable  Measures  of  Job  Pefonnance  JOB  PERFORMANCE  MEASUREMENT 

Collect  Job  Perform  ancc  Data 


JOB  PERFORMANCE  MODEUNG 


i— I  E  Validate  the  Mathemaucal  Model  |  MODEL  VAUDATION 

Figure  1.  Job  Performance  Model  Development  Process 

job  performance  measures  and  valid  job  performance  models  has  been  slow  and  has  come  in  small 
increments.  Much  progress  is  yet  to  be  made. 

Whereas  the  joint-service  JPM  Project  addressed  mainly  the  job  performance  measurement 
component  of  the  modeling  process,  the  current  Productive  Capacity  Project  is  attempting  to 
..ddress  all  of  them.  Initially  under  the  Productive  Capacity  Project,  the  job  performance  measure¬ 
ment  component  of  the  modeling  process  was  addressed— job  performance  wirs  defined,  experimen¬ 
tal  measures  of  job  performance  were  developed,  and  the  measures  were  applied  to  personnel  in 
four  Air  Force  Specialties  (AFSs)  (21).  The  next  step  was  to  proceed  to  the  job  performance  mod¬ 
eling  phase.  This  required  identification  of  factors  likely  affecting  job  performance,  specification  of 
mathematical  relationships  between  such  factors  and  job  performance,  and  formulation  of  proto¬ 
type  mathematical  modelsexpressing  the  relationship  between  the  factors  and  job  performance.  It 
was  this  job  performance  modeling  phase  that  provided  the  basis  for  this  thesis. 
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1.2  Statement  of  the  Problem. 


The  general  problem  facing  the  Air  Force  is  that  although  it  could  greatly  benefit  from  the 
ability  to  forecast  the  future  job  performance  of  applicants  and  incumbents,  it  is  currently  limited 
in  its  ability  to  do  so.  Development  of  job  performance  prediction  models  has  not  yet  progressed 
to  the  point  where  current  models  are  suitable  for  operational  use.  If  suitable  models  are  to  be 
developed,  the  model  development  process  must  continue. 

On  a  much  smaller  scale,  the  Air  Force  would  like  to  use  the  data  collected  under  the  Produc¬ 
tive  Capacity  Project  to  develop  experimental  regression-based  mathematical  models  for  relating 
job  performance  measures  to  certain  predictor  variables.  The  predictor  variables  the  Air  Force 
wishes  to  consider  are  mental  aptitude  and  job  experience. 

The  purpose  of  this  thesis  was  to  address  this  smaller  scale  problem  by  performing  the  re¬ 
quired  regression  analyses  on  the  Productive  Capacity  Project  data  to  obtain  the  model  parameter 
estimates  needed  to  formulate  an  experimental  model.  In  terms  of  the  model  development  process 
expressed  in  Figure  1,  this  thesis  addressed  the  last  two  items  of  the  model  development  component, 
given  the  Productive  Capacity  Project  data  and  the  predictors,  aptitude  and  experience. 


1.3  Researvh  Objectives. 


j  1.3.1  Formulate  a  Productive  Capacity  Measure  from  Estimated  Task  Performance  Times. 

The  job  performance  data  collected  under  the  Productive  Capacity  Project  were  in  the  form  of 
eltimatcd  performance  times  on  various  job  tasks  specific  to  each  of  four  jobs  studied. 

In  their  raw  form,  the  estimated  performance  times  were  of  limited  value.  One  problem  with 
them  is  that  the  raw  times  themselves  communicate  little  about  an  individual’s  relative  /cue/ of  job 


performance.  In  order  to  assess  performance  level,  one  must  first  have  knowledge  about  how  others 
perform  on  the  tasks,  such  as  how  long  on  average  it  takes  people  to  do  the  tasks.  Another  problem 
with  raw  time  data  is  that  they  do  not  have  meaning  o\iiside  of  the  a«.sociated  tasks.  The  ta.sk 


performance  times  have  meaning  only  within  a  task  wi.hin  a  job,  and  not  across  tasks  or  across  jobs. 
Comparing  performance  times  across  tasks  is  like  tlie  proverbial  comparison  of  apples  to  oranges. 
To  illustrate  these  problems,  consider  a  tvvo-iask  scenario  where  the  average  times  to  complete  the 
two  tasks  are  10  and  20  minutes,  respectively.  Assume  an  individual  completes  the  first  task  in  15 
minutes,  and  the  second  in  15  minutes  as  well.  Without  also  considering  the  average  performance 
times  for  the  tasks,  the  individual’s  performance  times  suggest  that  performance  was  comparable 
on  the  tasks.  But,  when  considering  the  average  performance  times,  it  can  readily  be  seen  that 
the  individual  took  significantly  longer  than  average  to  complete  the  first,  and  considerably  shorter 
to  complete  the  second.  There  is  obviously  a  difference  in  performance  levels  across  the  two  tasks 
that  cannot  be  seen  from  the  raw  data. 

This  implies  a  need  to  standardize  the  performance  time  data.  One  possible  standardization 
could  be  obtained  through  forming  a  ratio  of  the  time  data  to  a  constarit,  say  the  task  mean.  This 
transformation  of  the  time  data  would  have  the  desired  affect  of  making  the  resulting  measure 
comparable  across  tasks.  Such  standardization  is  necessary  both  for  making  comparisons  across 
tasks  and  for  aggregating  task-level  data  into  overall  job-wide  measures  that  have  meaning  in  and 
of  themselves. 

The  first  research  objective  was  therefore,  to  find  a  suitable  transformation  of  the  performance 
time  data,  to  standardize  it.  A  transformation  used  by  the  Air  Force  in  previous  RfcD  efforts  weis 
to  create  a  productive  capacity  (PC)  measure  from  the  performance  time  data  (5)  (13)  (21).  A  PC 
measure  is  intended  to  e.xpress  job  performance  in  ternts  of  how  fast  an  airmen  can  perform  a  piece 
of  work  in  reference  to  a  standard  perfonnance  time.  It  so  happens  that  formulating  a  PC  measure 
from  the  trisk  performance  times  can  standardize  the  data,  giving  it  broader  interpretability.  For 
instance,  the  original  PC  formulation  proposed  by  Carpenter,  Monaco.  O'Mara  and  Teachout  is 
f/f,  where  t’  is  the  fastest  time  in  which  a  task  can  be  completed  and  t  is  an  individual’s  raw- 
performance  time  (5;21).  With  this  formulation.  PC  always  ranges  from  zero  to  one  and  can 


be  interpreted  as  an  individual’s  output  as  proportion  of  maximum  possible  output.  Other  PC 
formulations  also  provide  similarly  helpful  standardizations  and  interpretations. 

1.3.2  Select  a  Task  Weighting  Scheme.  The  second  research  objective  was  to  determine 
a  weighting  scheme  for  assigning  differing  levels  of  importance  to  tasks  when  aggregating  task-level 
measures  into  overall  job  or  aggregate  measures 

No  weighting  of  the  tasks  implies  that  the  performance  on  each  task  should  be  allowed  to 
equally  influence  overall  PC.  This  was  considered  a  questionable  practice  since  tasks  were  known 
to  differ  on  such  dimensions  as  criticality,  learning  difficulty,  time  required  to  perform  them,  and 
percent  of  time  airmen  spend  doing  them  (40).  Since  the  tasks  were  known  to  differ  in  importance 
along  such  dimensions,  it  was  recognized  that  one  or  more  dimensions  could  provide  numerical 
values  to  serve  as  tcisk  weights  that  would  help  in  better  defining  overall  PC. 

The  second  objective  was,  then,  to  identify  an  appropriate  dimension  from  which  to  derive  a 
task  weighting  scheme,  followed  by  actual  computation  of  task  weights. 

1.3.3  Aggregate  the  Task- Level  Data  into  an  Overall  Productive  Capacity  Measure. 

The  third  objective  v.as  to  determine  an  appropriate  way  of  computing  an  individual’s  overall 
or  aggregate  productive  capacity,  using  the  PC  measures  computed  at  the  task  level.  Task-level 
performance  data  can  provide  some  limited  insight  into  airmen  job  performance,  but  of  ultimate 
importance  to  the  .Air  Force  is  how  well  airmen  perform  overall.  This  is  because  Air  Force  jobs  tend 
to  be  multifaceted  requiring  the  performance  of  a  variety  of  tasks.  Jobs  may  also  frequently  change 
in  scope.  Because  .Air  Force  jobs  do  tend  to  require  a  variety  of  task  skills,  ta,sk-level  performance 
data  must  be  collajised  into  overall  measures  that  reflect  an  airman’s  ability  to  meet  a  job’s  overall, 
multifaceted  demands. 

The  third  objective  was,  therefore,  to  determine  and  apply  a  means  of  aggregating  the  task- 
level  data  into  overall  measures  of  job  performance. 


1.3-4  Develop  Prediction  Models.  The  fourth  and  most  important  objective  was  to  de¬ 
velop  descriptive  regression  models  for  relating  task-level  and  overall  PC  to  the  predictors,  aptitude 
and  expel  .ce.  The  purpose  of  the  regression  models  was  to  express  how  aptitude  and  experience 
appear  to  effect  PC. 

Numerous  possibilities  existed  for  the  functional  form  of  regression  models.  Possibilities  con¬ 
sidered  included  first-order  and  higher-order  linear  models,  learning  curve-type  logarithmic  models 
and  logistic  models.  The  objective  was  to  select  a  reasonable  form  for  the  regression  models,  de¬ 
pending  on  the  formulation  of  the  PC  measure,  followed  by  estimation  of  the  model  parameters 
using  appropriate  techniques.  As  an  adjunct  to  the  research  objective,  the  model  was  evaluated 
through  residual  analysis  and  through  comparison  of  the  model  results  to  other  performance  mea¬ 
sures  and  previous  studies. 

In  short,  the  fourth  objective  was  to  select  an  appropriate  regression  model,  estimate  its 
parameters  and  analyze  its  results. 

1.4  Scope. 

Under  the  Productive  Capacity  Project,  Leighton  and  others  collected  performance  data  on 
four  Air  Force  Specialties  (21).  This  thesis  will  concentrate  on  the  analysis  of  data  from  one  of  these 
jobs,  454X1,  Aerospace  Ground  Equipment.  It  was  limited  to  the  study  of  a  single  job  to  keep  the 
size  of  the  effort  manageable.  The  methodology  developed  via  this  single-job  research  should  find 
application  in  the  analysis  of  the  three  remaining  jobs  by  the  project  sponsor,  the  Manpower  and 
Personnel  Research  Division,  Human  Resources  Directorate,  Armstrong  Laboratory  (AL/HRM). 

1.3  Assvinpiions. 

Throughout  this  research  effort  it  was  assumed  the  job  performance  measures  derived  from 
the  supervisors'  task  time  estimates  are  valid  and  reliable.  In  very  general  terms,  valid  means 
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that  the  measures  accurately  measure  what  they  purport  to  measure — the  true  job  performance  of 
the  individuals  studied.  In  equally  general  terms,  Ttliability  means  that  the  PC  measures  can  be 
consistently  collected.  Siegel  and  Lane  (1974)  describe  reliability  as  a  demonstration  that  measures 

do  not  fluctuate  unduly  over  time  as  a  result  of  something  inherent  in  the  test  itself 

(including  scorer  subjectivity),  the  transitory  nature  of  the  function  being  assessed, 

or  by  factors  extraneous  to  the  particular  behavior  the  test  is  designed  to  evaluate. 

(37:125) 

1.6  Limitations. 

A  significant  limitation  to  this  thesis  involves  the  interpretation  and  usability  of  the  results.  As 
mentioned,  the  goal  of  the  thesis  is  to  develop  an  experimental  mathematical  model  for  predicting 
the  job  performance  of  enlisted  personnel  in  APS  454X1,  Aerospace  Ground  Equipment.  The 
experimental  model  is  to  provide  some  insight  into  how  the  predictors,  aptitude  and  experience, 
might  influence  an  experimental  measure  of  job  performance,  PC. 

It  must  be  stressed  that  the  PC  measurement  methodology  was  still  in  its  early  stages,  and 
the  current  PC  measure  was  previously  untested.  Also,  the  model  or  models  developed  as  part 
of  this  thesis  include  only  a  limited  number  of  possible  predictors.  The  results,  therefore,  are  not 
appropriate  for  use  in  operational  manpower  decisions  or  for  use  in  addressing  any  other  operational 
concerns.  The  results  are  suitable  for  providing  a  basis  for  future  research,  and  for  providing  very 
general  ideas  about  how  and  which  factors  might  affect  job  performance. 

1.7  Summary. 

The  Air  Force  has  recognized  that  it  could  benefit  from  measuring  and  predicting  the  job 
performance  of  both  its  current  personnel  and  its  applicants.  It  has  undertaken  several  research 
projects  with  the  aim  of  developing  valid  job  performance  measures.  The  Air  Force’s  most  recent 
RA:D  efforts  have  begun  to  investigate  the  potential  uses  of  job  performance  measures  in  manpower 
and  personnel  decisions,  and  force  planning  and  modeling. 
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This  thesis  contributes  to  the  Air  Force’s  R&'.D  efforts  by  addressing  the  job  performance  mod¬ 
eling  phase  of  the  job  performance  model  development  process  (see  Figure  1)  using  data  collected 
under  the  Productive  Capacity  Project.  The  remainder  of  this  thesis  documents  this  research. 
Ciiapter  2  provides  an  in-depth  discussion  of  background  material  reviewed  as  a  first  step  in  under¬ 
standing  the  relevant  research  issues.  It  provides  an  overview  of  the  model  development  process, 
and  a  chronology  of  previous  research  while  highlighting  those  items  relevant  to  the  current  research 
objectives.  Chapter  3  describes  the  research  methodology  used  to  prepare  the  data  for  analysis, 
and  further  describes  how  the  regression  models  were  estimated.  It  includes  details  of  the  data 
editing  procedures,  computation  of  aggregate  PC  measures,  and  the  regression  models  used.  Chap¬ 
ter  4  provides  the  results  and  pertinent  discussion  concerning  the  research  findings.  It  provides 
regression  results  to  include  the  estimated  parameters  and  relevant  statistics  of  model  fit.  Chapter 
4  also  includes  correlational  analyses  of  the  model  predicted  values  with  other  job  performance 
measures.  It  concludes  with  a  graphical  representation  of  the  estimated  models.  And  finally.  Chap¬ 
ter  5  provides  a  summary  of  the  research,  important  conclusions  and  recommendations  for  further 
research. 
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II.  Literature  Review 


In  Chapter  1  it  was  explained  how  the  primary  research  objective  of  this  thesis  was  the 
development  of  experimental  regression- based  models  for  predicting  job  performance  given  the 
subjects’  aptitude  and  experience.  With  this  in  mind,  this  chapter  provides  a  background  of 
information  relevant  to  this  thesis,  couched  in  terms  of  a  modeling  scenario.  The  review  thus 
begins  with  a  brief  overview  of  the  modeling  concept. 

S.l  Overview  of  Modeling. 

Frequently,  R&D  efforts  involve  the  study  and  analysis  of  processes  or  systems.  Such  processes 
or  systems  are  often  very  complex  or  not  well  understood.  Usually  the  analyst  desires  to  study 
a  system  in  order  to  better  understand  it  and  to  try  to  specify  the  relationship  between  system 
inputs  and  outputs. 

Understanding  of  a  system  is  often  gained  and  advanced  through  development  of  a  model 
representing  the  system.  According  to  Law  and  Kelton,  a  model  is  an  abstract  “representation 
of  a  system  developed  for  the  purpose  of  studying  that  system.”  (20:3).  Figure  2  depicts  the 
relationship  between  an  actual  system  and  the  system  model.  The  actual  system  usually  tends  to 
be  complex  and  the  relationship  between  the  inputs  and  outputs  is  usually  not  clearly  defined  or  well 
understood.  The  system  model  attempts  to  clearly  define  the  system  and  specify  the  relationships 
between  the  inputs  and  outputs. 

It  must  be  pointed  out  that  not  all  models  are  good  models.  Some  do  not  properly  represent 
the  system,  some  oversimplify  the  system  and  some  can  be  as  complex  as  the  system  itself.  In 
general,  a  good  model  is  one  which  is  as  simple  as  possible  while  still  adequately  representing  the 
associated  system. 
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Figure  2.  Graphical  Representation  of  an  Actual  System  Related  to  a  System  Model 
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Figure  3.  Graphical  Representation  of  a  Mathematical  Model 

There  are  many  types  of  models.  These  include  mathematical  models,  conceptual  models, 
computer  models,  and  simulation  models,  to  name  a  few.  Of  primary  concern  to  the  current 
research  were  mathematical  models  because  these  were  the  type  requiring  development. 

A  mathematical  model  is  a  model  in  which  the  system  is  represented  as  a  mathematical 
relationship  between  the  system  inputs  and  system  outputs.  In  general,  a  mathematical  model  can 
expressed  as  in  Figure  3. 

The  black  box  in  Figure  3  represents  the  mathematical  model  which  is  generally  some  mathe¬ 
matical  function  of  the  input  variables.  Derivation  of  the  mathematical  function  relating  the  inputs 
and  output  generally  involves  rigorous  experimentation  and  statistical  analyses  to  answer  questions 
like  the  following; 

1.  Which  input  variables  should  be  included? 

2.  Should  the  variables  be  examined  in  their  original  form,  or  should  they  be  transformed? 

3.  How  complex  a  model  is  necessary?  (4:4-6) 

Answering  questions  like  those  above  requires  application  of  one  or  more  mathematical  t  echniques. 
One  frequently  applied  technique  is  regression  analysis  which  is  often  used  in  analysis  of  linear 
mathematical  models.  Linear  mathematical  models  and  linear  regression  are  discussed  in  the 
following  section. 
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2.1.1  Linear  Models  and  Linear  Regression.  Linear  mathematic?,!  models,  like  all 

mathematical  models,  relate  system  out;,  at  to  inputs  via  mathematical  functions.  In  linear  models 
and  most  regression  applications,  the  input  variables  are  frequently  referred  to  as  predictors,  and 
the  output  is  ofien  called  the  response.  Tiie  predictors  are  often  referred  to  as  xt,  and  response  as 
What  distinguishes  a  linea'  model  from  other  mathematical  models  is  that  the  mathematical 
function  relating  the  response  to  the  predictors  is  linear  with  respect  to  the  coefficients  associated 
with  the  function’s  terms.  In  other  words,  a  linear  model  is  one  that  can  be  expressed  as  in 
Equation  1  (4:36). 

y  = /?i2i  + +  •  •  •  + +  f  (f) 

the  response 

specified  functions  of  the  prediclor(s),  n 
model  coefficients,  or  parameters 
model  error  terms,  representing  the  deviation  of 
the  system  data  points  from  the  underlying  model. 

Note  that  for  a  model  to  be  linear,  it  need  only  be  linear  in  terms  of  the  jS  coefficients.  The 
functions,  Z,  need  not  be  linear  functions  of  the  predictors.  The  Z  functions  are  often  higher-order 
forms  of  the  predictors  (e.g.,  x^)  or  interaction  terms  (e.g.,  Xt.iXj.)  to  account  for  curvature  in 
the  curve  or  surface  defined  by  the  model.  The  actual  specification  of  the  Z  functions  depends 
on  the  nature  of  the  data  and  the  underlying  mathematical  relationship  l>etween  the  predictor 
and  response.  The  analyst  frequently  includes  various  Z  functions  because  of  prior  knowledge  or 
hypotheses  about  the  system  under  study.  Also,  preliminary  analyses  and  data  exploration  can 
provide  insight  as  to  the  specification  of  the  Zs. 


where 

Y 

Zi,Z2,...,Zp  = 
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ll  is  important  at  this  point  to  make  a  distinction  between  mechanistic  and  empirical  math¬ 
ematical  models  (4:10-11).  A  mechanistic  model  refers  to  the  true  underlying  mathematical  rela¬ 
tionship  between  the  input  and  output  variables.  An  empirical  model  is  an  approximation  of  the 
true  relationship,  estimated  from  data  sampled  from  the  system  in  question.  Specification  of  the 
actual  mechanistic  model  is  almost  always  impossible  or  impractical  due  to  such  things  as  mea¬ 
surement  and  sampling  error,  and  limited  data.  Therefore,  it  is  usually  the  goal  of  the  analyst  to 
derive  an  empirical  model  using  a  sample  of  system  data. 

Given  a  linear  mathematical  model  of  the  form  expressed  in  Equation  1,  linear  regression 
analysis  is  frequently  performed  to  aid  in  deriving  the  empirical  model.  Linear  regression  is  a 

technique  for  obtaining  estimates  of  the  l)  parameters,  given  a  set  cf  predictor  and  response  data. 

!  ' 

After  estimation  of  the  parameters,  an  cmpiricjal  linear  mathematical  model  can  be  expressed  for 

the  system  in  question.  | 

( 

Neter,  Was.serman  and  Kutner  describe  regression  models  as  .serving  three  primary  purposes 
(27:31).  These  are  description,  control  and  prediction.  To  use  a  regression  model  for  description 
means  to  estimate  the  model  parameters  so  that  the  relationship  between  the  variables  can  be 
specified  and  the  model  can  thus  be  used  to  desciibe  the  system.  To  use  regression  models  for  control 
means  to  specify  the  relationship  between  the  predictors  and  response  so  that  system  specifications 
can  be  adhered  to.  Finally,  as  the  name  implies,  prediction  means  the  use  of  regression  models  to 
predict  or  forecast  the  system  response  given  known  levels  of  the  predictors.  The  three  purpose.s 
may  overlap  in  a  given  study.  It  was  previously  mentioned  that  the  Air  Force  would  like  to  develop 
models  for  prediction  of  airma"  job  performance.  This  thesis  was  designed  to  contribute  to  the 
model  develojiinent  process  by  developing  regression  models  more  for  description  than  prediction. 
Development  of  such  descriptive  models  is  an  integral  part  of  the  model  development  process  a.s 
efforts  arc  made  to  belter  understand  the  nature  of  the  relationship  between  potential  predictors 
and  the  response. 

15 


The  most  common  method  of  obtaining  estimates  for  the  15  parameters  in  linear  regression 
is  the  method  of  least  squares.  In  least  squares,  the  model  parameters  are  estimated  such  that  the 
resulting  equation  they  define  represents  a  response  curve  or  surface  that  minimizes  the  sum  of  the 
squared  distances  from  the  actual  data  points  to  the  curve  or  surface  that  is  estimated.  Application 
of  linear  regression  requires  the  assumptions  that  the  values  of  the  predictor  variables  for  a  given 
set  of  data  are  known  constants,  and  also  that  the  0s  are  constants  that  require  estimation.  Linear 
regression  further  assumes  that  the  model  error  terms,  <,  are  independent  random  variables  that  are 
distributed  such  that  they  have  a  mean  of  zero.  That  is  to  say.  given  a  fixed  level  of  the  predictors, 
on  repeated  sampling,  the  error  is  cissumed  to  be  distributed  such  that  its  inean  is  zero.  This  means 
that  the  expected  value  of  the  respon.se,  V  (denoted  E(Y))  is  0\Z\  +  +  . . .  +  0pZp  since  the 

0s  and  Zs  are  constants. 

Least  squares  is  concerned  with  minimizing  the  squared  distance  between  each  observed  Y 
and  the  its  expected  value,  0iZi  +0nZ2  +  .  ..  +  0pZp.  The  equation  to  be  minimized  in  least  squares 
is  expressed  in  Equation  2  (27:39). 


Q  = 

1=1 

where 

Q 

I 

ri 

v; 

^In  ^'Ji-  ■  •  •  1 

0\,0-2, 


-0xZu-07Z2i-  -.-0pZpif 

=  the  expressed  sum 
=  obsrrrattan  number 
—  total  number  of  observation.s 

=  the  response  for  ohservatior,  i 

=  specified  functions  of  the 

predicior(s)  for  observation  i 
=  parameters  to  be  estimated. 


(2) 


/ 

/ 
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Equation  2  is  minimized  with  respect  to  the  /5s  using  standard  calculus  minimization  tech¬ 
niques.  The  minimiz.ation  yields  the  least  squares  estimators  for  the  0s.  The  estimators  are 
frequently  referred  to  .ls  the  /5s.  Least  squares  estimators,  or  /5s,  have  the  appealing  property  of 
being  minimum  variance,  uribiased  estimators  of  actual  0s.  Having  computed  the  /5s,  the  empiri¬ 
cal  regression  model  can  be  stated  and  the  system  response  can  be  estimated,  or  predicted,  given 
specified  levels  of  the  predictors.  The  estimated  response  is  frequently  referred  to  as  Y. 

It  was  previously  mentioned  that  it  was  assumed  in  linear  regression,  that  the  erroi  terms,  e, 
were  distributed  for  a  given  level  of  the  predictors,  such  that  their  mean  is  zero.  It  is  often  further 
assumed  that  the  c  not  only  have  a  mean  of  zero,  but  are  normally  distributed  with  mean  zero  and 
variance  (c  ~  A'(0,<t^)).  The  normality  assumption  allows  certain  statistical  infereiices  to  be 
made  concerning  the  regre.ssion  results. 

Prior  to  discussing  statistical  inferences  about  the  regression  results,  the  following  discussion 
is  included  to  show  that  the  assumption  of  error  terms  being  distributed  /V(0,(T^)  implies  that  the 
Vs  are  likewise  distributed  normally.  This  result  has  a  direct  impact  on  the  statistical  inferences 
which  can  be  made.  Consider  Equation  1  and  assume  c  ~  /V(0,(r^).  Since  the  predictors  and  the 
model  parameters  are  constants,  Y  can  be  shown  to  be  distributed  with  variance  cr^,  the  same 
variance  as  the  error  term.  Since  the  predictors  and  the  parameters  are  constants,  let  the  right- 
hand  .side  of  Equation  1  be  expressed  as  c  -1- 1.  Next,  let  the  variance  of  Y  (denoted  as  V(y))  be 
written  as  l'(c  -f  <)  which  equals  simply  y(f).  It  follows  then  that  V’(V')  =  V(t)  =  <r-.  Further, 
since  it  is  assumed  that  t  is  distributed  /V(0,  <r-)  and  that  Y  =  c-i- e,  Y  not  only  has  a  variance  <t*, 
it  is  distributed  .V(c,(T’). 

The  assum|)t  ion  that  the  error  terms,  c,  and  thus  the  Vs  are  normally  distributed  is  important 
when  making  inferences  concerning  the  0s.  Since  it  can  be  shown  that  the  0s  are  linear  combina¬ 
tions  of  the  Vs.  the  /5.s  are  likewise  normally  distributed.  This  fact  means  that  the  1  distribution 
can  be  used  to  make  inferences  about  the  0s.  The  following  discussion  of  inferential  statistics 


commonly  used  with  linear  regression  is  an  overview  of  the  more  in-depth  coverage  given  by  Neter, 
Wasserman  and  Kutner  (27). 

Following  linear  regression,  it  is  common  to  test  whether  a  given  /?i  is  significantly  different 
from  zero.  Following  are  the  null  and  alternative  hypotheses  for  such  a  test. 

Ho  ■■  0k  =0 

Ha-0k^O 

The  test  statistic  to  is  computed  as  fo  =  -  where  is  the  estimated  standard  error 

of  0k.  The  decision  rule  for  deciding  the  outcome  of  the  test  is  as  follows. 

|foi  ^  f(  1  —  a/2.r  — p)i  conclude  Ho 
If  |/o|  >  <(I-a/2,n-e).  conclude  Ha 

Here,  a  represents  the  preselected  probability  of  Type  1  error,  which  means  a  is  the  probability 
that  Ha  will  be  concluded  when  Ho  is  true.  Also,  n  is  the  number  of  cases  on  which  the  regression 
is  based  and  p  is  the  number  of  0  parameters  included  in  the  model. 

Enroute  to  discussing  further  statistical  tests  of  regression  results,  it  is  necessary  to  introduce 
the  concept  of  sunj.s  of  squares.  The  sums  of  squares  concept  involves  the  partitioning  of  the  sum 
of  the  squared  deviations  of  the  Y.s  from  the  average  V'.  The  sum  of  the  squared  deviations  of 
the  Vs  from  Y  is  referred  to  as  the  total  sum  of  squares  and  it  expressed  in  Equation  3. 

n 

SSTO  =  ^(V  -  V)-  (3) 

.=1 

where 
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SSTO 


total  sum  of  squares 


i  =  observation  number 

n  =  total  number  of  observations 

Yi  =  the  response  for  observation  i 

y  =  the  average  response 

The  total  sum  of  squares  can  be  viewed  as  a  measure  of  total  variation  of  the  Ys  from  the 
mean  response  (27:87).  SSTO  can  be  partitioned  into  two  pieces,  sum  of  squares  for  error  and  sum 
of  squares  for  regression.  These  are  expressed  in  Equation  4  and  Equation  5,  respectively. 


where 


where 


ssE  =  ^(v;  -  v;)^ 

•si 


(4) 


SSE  =  sum  of  squares  for  error 

i  =  observation  number 

n  =  total  number  of  observattons 

V)  =  the  response  for  observation  « 

Yi  =  the  estimated  response 

n 

ssR  =  53(>;  - 

•si 
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SSR  =  sum  of  squares  for  regression 

i  =  observation  number 

n  =  total  number  of  observations 

Yi  =  the  estimated  response  for  observation  i 
Y  =  the  average  response 

The  sum  of  squares  for  error  represents  a  measure  of  variation  of  the  observed  data  with 
respect  to  the  estimated  model.  The  sum  of  squares  for  regression  represents  the  variation  of  the 
estimated  response  values  with  the  mean  response.  Again,  note  that  the  total  deviation  of  the 
response  from  the  average  response  {SSTO)  can  be  partitioned  into  the  deviation  of  the  observed 
response  values  from  the  estimated  response  values  (SSE)  and  the  deviation  of  the  estimated 
response  values  from  the  mean  (SSR).  Equation  6  and  Equation  7  summaiize  the  relationship 
between  SSTO,  SSE  and  SSR  (27:87-89). 

SSTO  =  SSE  +  SSR  (6) 

n  n  n 

+  E(^-  - 

i  =  l  i=l  i=l 

where 
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Yi-Y 
Yi  -  Y 

Yi~Yi 


observation  number 
total  number  of  observations 
total  deviation 

deviation  of  estimated  response 
around  mean 

deviation  of  observed  response 
around  estimated  response. 


After  computation  of  the  sums  of  squares,  mean  squares  can  be  computed.  The  mean  squares 
for  regression  (MSR)  and  mean  squares  for  error  (MC~')  are  computed  by  dividing  the  associated 
sums  of  squares  by  their  corresponding  degrees  of  freedom  (df).  Degrees  of  freedom,  in  general 
terms,  refers  to  the  number  of  opportunities  in  which  variab.js  are  free  to  vary,  given  a  set  of  data. 
For  instance,  SSTO  has  n—\df,  where  n  is  the  number  of  observations  in  the  sample.  One  degree 
of  freedom  is  lost  because  the  deviations  Vj  —  Y  must,  by  definition,  sum  to  zero.  This  means 
that  n  —  1  y  observations  are  free  to  vary,  leaving  the  last  observation  no  freedom  to  vary.  It 
can  be  equivalently  stated  that  one  degree  of  freedom  is  lost  because  Y  was  used  to  estimate  the 
true  .system  mean  (27:91).  For  SSE,  there  are  n  —  p  degrees  of  freedom,  where  p  is  the  number 
of  parameters  estimated.  One  degree  of  freedom  is  lost  for  each  estimated  parameter.  SSR  has 
associated  with  it  p  —  1  df.  There  are  p  parameters  in  the  model  but  one  degree  of  freedom  is  lost 
because,  by  definition,  the  deviations  Yi  —  Y  must  sum  to  zero.  Thus,  p  —  1  parameters  are  free 
to  vary  but  the  last  one  is  not.  Equation  8  and  Equation  9  show  the  computations  for  AfSE  and 
MSR,  respectively. 


where 


MSE  = 


SSE 

n-p 


(8) 
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MSE 

SSE 

n 

P 

n-p 


mean  square  for  error 

sum  of  squares  for  error 

the  number  of  obsemations  in  the  sample 

the  number  of  parameters  included  in  the  model 

the  degrees  of  freedom  for  associated  sum  of  squares  (SSE). 


M  SR  = 


SSR 
P-  1 


(9) 


where 

MSR  =  mean  square  for  regression 
SSR  =  sum  of  squares  for  regression 
p  =  the  number  of  parameters  included  in  the  model 

p—l  =  the  degrees  of  freedom  for  associated  sum  of  squares  (SSR). 


Having  computed  the  mean  squares,  a  common  statistical  test  for  overall  regression  relation 
can  be  performed.  This  test  makes  use  of  the  fact  that  given  the  previous  linear  regression  model 
assumptions,  the  value  is  distributed  according  to  the  F  distribution.  The  null  and  alternative 
hypotheses  for  the  test  are  as  follows. 


//o  '■  01  =  Ih  =  ■  ■  ■  —  0p- 1  =  0 
:  not  all  the  0k  [k  =  l,....p  —  1)  =  0 

Again,  n  is  the  number  of  cases  included  in  the  regression  and  p  is  the  number  of  parameters 


included  in  the  model. 


Table  1.  General  ANOVA  Table  for  a  Linear  Regression  Model 


Source  of 
Variation 

SS 

df 

MS 

Fo 

Regression 

ssR = -  y)^ 

P-1 

msr= 

p  _  MSR 
^0  ~  MSB 

Error 

SSE=Z'!=iiy-Yif 

n-p 

MSE  =  ^ 

Total 

ssTo  - 

n  -  1 

The  test  statistic  Fq  is  computed  as  Fo  =  where  MSR  and  MSB  are  the  model  mean 
square  for  error  and  mean  square  for  regression,  respectively.  The  decision  rule  for  selecting  a 
hypothesis  is  as  follows. 

If  Fo  <  F(,i-Q;p-i,n-p),  conclude  Ho 
If  Fo  >  F(i_<,;p_i,n_p),  conclude  Ha 

The  F  test  for  regression  relation  serves  primarily  to  determine  whether  any  of  the  predictor 
variables,  in  their  proposed  format,  are  providing  any  statistically  significant  prediction  of  the 
response  variable. 

After  computation  of  the  sums  of  squares,  mean  squares  and  the  F  statistic  for  overall  re¬ 
gression  relation,  linear  regression  results  are  frequently  summarized  with  an  analysis  of  variance 
(ANOVA)  table.  An  ANOVA  table  is  presented  in  Table  1. 

Recall  that  the  above  statistical  tests  require  the  assumption  that  the  error  terms,  e,  are 
independent  random  variables  distributed  N{0.  (t^).  The  assumption  of  normality  of  the  error  terms 
is  frequently  tested  through  analysis  of  the  rtsidualx.  Residual  is  another  name  for  the  deviation 
of  an  observed  response  from  its  predicted  value,  V)  -  V,.  (Residuals  are  often  denoted  as  e.) 
Residuals  are  frequently  analyzed  to  determine  the  aptness  of  the  proposed  regression  model.  The 
error  terms  (c  =  Y)  —  E{Yi))  themselves  cannot  be  analyzed  because  the  true  mechanistic  regression 


model  (E{Yi))  is  unknown  and  thus  the  error  terms  are  unknown.  To  analyze  the  residuals,  they 
are  frequently  plotted  against  the  estimated  response  /alues  and  the  predictor  variables.  These 
plots  indicate  whether  the  variance  of  the  residuals  (and  thus  the  variance  of  the  error  terms)  is  in 
fact  constant  ((t^)  over  varying  levels  of  the  other  variables.  Such  constancy  of  variance  is  called 
homoscedasiicity.  A  plot  of  the  residuals  against  the  expected  residuals  given  a  normal  distribution, 
is  also  frequently  plotted.  This  is  called  a  normal  probability  plot  and,  as  the  name  implies,  will 
indicate  whether  the  residuals  (and  thus  the  error  terms)  appear  normally  distributed. 

If,  after  residua!  analysis,  it  appears  tliat  the  estimated  regression  model  is  not  apt,  often 
either  the  predictor  variables  or  the  response  (or  both)  can  be  mathematically  transformed  to  make 
it  so.  Neter,  Wasserman  and  Kutner  discuss  several  such  transformations  (27). 

As  mentioned  previously  in  the  discussion  of  general  modeling,  not  all  models  are  good  models. 
In  linear  regression,  the  goodness  of  model  fit  is  frequently  assessed  through  the  statistic  R~. 
is  called  the  coeffictent  of  multiple  deierminattofi  and  is  interpreted  as  the  proportion  of  variance 
in  the  response  that  is  explained  by  the  estimated  model.  The  computation  for  R'^  is  shown  in 
Equation  10.  A  high  R?  indicates  the  estimated  empirical  model  fits  the  data  well  and  thus  may 
provide  reasonable  prediction  results. 


,  SSR  SSE 

~  SSTO  "  SSTO 


(10) 


The  above  tests  and  statistics  illustrate  only  some  of  the  more  common  descriptive  and 
inferential  statistics  applied  to  linear  regression  results.  While  there  are  numerous  other  tests  those 
discussed  above  are  employed  throughout  thi.s  thesis  and  thus  required  review  at  this  time. 

The  previous  iliscnssion  of  linear  mathematical  models  and  linear  regression  explained  how 
model  parameters  are  estimated  and  how  statistical  inferences  can  be  made  concerning  the  regres¬ 
sion  results.  The  pievious  discussion  as.sunied  that  a  suitable  regression  model  was  used.  Con- 
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'  structing  a  suitable  linear  regression  model  can  be  a  very  involved  process.  Neter,  Wasserman  and 
Kutner  describe  the  model-building  process  as  involving  the  following  four  phases  (27:433). 

1.  Data  collection  and  preparation. 

2.  Reduction  of  the  number  of  predictor  variables. 

3.  Model  refinement  and  selection. 

4.  Model  validation. 

These  four  phases  are  graphically  depicted  in  Figure  4  (27:434).  Note  the  relationship  between 
Figure  4  and  Figure  1  which  represents  the  job  performance  model  development  process.  Collaps¬ 
ing  the  second  and  third  phases  of  Figure  4  into  one  phase  would  make  the  two  figures  highly 
comparable.  This  means  that  the  process  for  developing  a  mathematical  job  performance  model 
is  virtually  the  same  as  the  process  for  developing  any  linear  regression  model.  This  process  can 
generally  be  extrapolated  to  any  mathematical  model  development. 

The  first  phase  of  the  regression  model  building  process,  data  collection  and  preparation, 
involves  the  gathering  of  the  data,  preferably  through  some  designed  experiment  which  will  yield 
the  type  of  data  needed  to  answer  the  research  questions.  Following  collection  of  the  data,  the  data 
must  be  prepared  for  analysis.  Data  preparation  may  involve  screening  out  any  predictor  variables 
which  are  not  fundamental  to  the  problem,  which  are  subject  to  large  mesisurement  error,  or  which 
duplicate  other  predictors  (27:435).  Data  preparation  also  involves  editing  of  the  data  to  remove 
any  gross  data  errors,  and  identification  of  any  extreme  outlying  observations  which  can  adversely 
influence  regression  analyses.  Useful  tools  for  identifying  data  errors  and  outlying  cases  include 
scatterplots,  histograms  and  frequency  distributions  of  the  predictors  and  response. 

The  second  phase  of  the  model  building  process  involves  the  reduction  of  the  number  of 
predictor  terms.  Once  the  functional  form  of  the  regression  relation  has  been  decided  upon  (whether 
the  predictor  or  response  variables  are  to  appear  in  linear  form,  quadratic  form,  logarithmic  form. 
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Diagnostics  For 
Relationships 
and  Strong 
Tnipraciinns — 


Determine  Several 
Potentially 
Useful  Subsets 
oi  Predictors 


Investigate  For 
Curvature  and 
Interaction 
_ Ffferts _ 

Perform 

Study  Residuals 

Remedial 

and  Other 

Measures 

Diagnostics 

Yes 

T" 

Remedial^  ^ 

No 


DATA  COLLECTION 
AND  PREPARATION 


REDUCTION  OF 
NUMBER  OF 
PREDICTOR  VARIABLES 


MODEL  REFINEMENT 
AND  SELECTION 


MODEL 

VALIDATION 


Figure  4.  Strategy  for  Building  a  Regression  Model 
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etc.),  the  next  step  is  to  select  a  good  subset  {or  subsets)  of  the  predictor  terms  {Zs)  (27:43).  Recall 
from  the  previous  general  discussion  of  modeling,  a  good  model  is  one  that  not  only  adequately 
represents  the  underlying  system,  but  is  also  as  simple  as  possible.  This  is  the  reason  for  reducing  the 
number  of  predictor  terms,  if  possible.  One  common  technique  for  reducing  the  number  of  predictor 
terms  is  stepwise  linear  regression.  Stepwise  regression  is  an  automatic  search  procedure  that 
sequentially  develops  the  subset  of  predictor  terms  to  include  in  the  model.  In  very  general  terms, 
stepwise  regression  sequentially  adds  the  predictor  terms  to  the  regression  model  and  computes 
the  statistical  F  test  for  overall  regression  relation.  Predictor  terms  are  added  to  or  removed  from 
the  model  based  on  whether  their  associated  computed  F  statistics,  considering  other  variables 
currently  included  in  the  model,  exceed  or  fall  below  prespecified  F  statistic  criteria  (27:453-454). 
Stepwise  regression  can  be  an  efficient  way  of  obtaining  a  single,  parsimonious  (simple)  regression 
model. 

The  third  phase,  model  refinem'Tit  -nd  selection,  involves  study  and  improvement  of  the 
model(s)  resulting  after  reducing  the  number  of  predictor  terms.  In  this  stage,  the  data  are  checked 
in  detail  for  overlooked  evidence  of  curvature  and  interaction  effects.  The  model  assumptions  are 
checked  through  residual  analysis,  and  diagnostics  are  performed  to  identify  such  things  as  severe 
outlying  observations  (27:437-438).  Also,  remedial  measures  such  as  data  transformations  are  made 
if  necessary.  The  result  of  this  phase  is  the  identification  of  a  single  model  which  most  adequately 
and  parsimoniously  represents  the  system  under  study. 

The  last  phase  of  the  model  building  phase  is  model  validation.  Model  validation  involves  the 
assessment  of  the  model  in  terms  of  its  generalizeability  to  the  overall  system,  and  not  just  to  the 
data  from  which  it  was  created.  Model  validation  usually  involves  checking  the  model  against  new 
data,  theoretical  expectations,  earlier  results  or  simulation  results  (27:465). 

Having  provided  a  general  overview  of  modeling  with  emphasis  on  mathematical  models, 
and  namely  linear  models,  focus  will  now  turn  to  the  Air  Force’s  most  recent  R&D  concerning 
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the  measurement  and  subsequent  modeling  of  airmen  job  performance.  First,  however,  the  next 
section  will  provide  the  reasons  that  the  Air  Force  is  interested  in  job  performance  measurement 
and  modeling. 

£.2  Air  Force  Interest  in  Job  Performance  Research. 

£.2.1  Air  Force  Interest  in  Measuring  Job  Performance.  Aside  from  the  need  to 

obtain  job  performance  merisures  for  modeling,  there  are  other  reasons  thai  virtually  all  large, 

% 

success-depi  it  organizations  are  interested  in  measuring  the  job  performance  of  their  personnel. 
The  first  chapter  mentioned  some  operational  and  congressionally-mandated  requirements  which 
sparked  the  Air  Force’s  interest  in  measuring  job  performance.  Wayne  Cascio  provides  the  following 
reasons  that  organizations  in  general  are  interested  in  having  job  performance  measures(6;74). 

1.  Performance  measures  can  serve  as  a  basis  for  making  personnel  decisions  such  as  who  to  fire, 
who  to  reward,  and  who  to  promote. 

2.  Performance  measures  can  be  used  as  a  criteria  for  assessing  the  impact  of  any  number  of 
personality  or  situational  variables  on  job  performance. 

3.  Performance  measuri>s  an  serve  as  predictors  of  future  performance. 

4.  Performance  measures  can  help  eissess  training  programs  and  establish  training  objectives. 

5.  Performance  measures  can  provide  feedback  to  employees. 

6.  Performance  meeisures  can  help  in  diagnosing  and  developing  organizations. 

The  Air  Force  is  interested  in  measuring  job  performance  for  these  reasons  as  well.  What  Cascio 
is  saying  is  that  job  performance  measures  can  give  an  organization  the  abili’  to  improve  its 
manpower  and  personnel  systems  and  practices  in  numerous  ways.  Coupling  Cascio’s  reasons  with 
operational  requirements  like  those  discussed  in  Chapter  1  provides  the  Air  Force  with  several 
compelling  reasons  to  pursue  job  performance  meetsnremenl  research. 
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2.2.2  Air  Foret  Inttresi  in  Modeltng  Job  Performance.  Like  measuring  joh  performance, 
most  organizations  are  interested  in  modeling,  for  prediction  purposes,  the  job  performance  of  their 
incumbent  personnel  and  those  individuals  who  have  not  yet  joined  the  organization.  The  Air  Force 
can  certainly  be  counted  among  the  organizations  interested  in  modeling  performance.  Much  can 
be  gained  by  predicting  the  future  performance  of  applicants.  Such  prediction  could  help  in  the 
hiring,  or  enlistment,  process.  If  the  Air  Force  could  assess  ahead  of  time  who  is  likely  to  be  most 
productive  or  successful,  it  could  ensure  that  such  individuals  are  enlisted,  while  avoiding  those 
who  are  least  likely  to  be  productive. 

On  a  grander  scale,  the  modeling  of  job  performance  could  be  useful  in  manpower  planning. 
Predicted  job  performance  re.sulting  from  models,  could  be  used  as  a  basis  for  allocating  personnel 
to  various  jobs  according  to  some  desired  goal.  For  instance,  if  the  Ait  Force  could  predict  job 
performance,  it  could  assign  its  personnel  to  ensure  that  maximum  possible  levels  of  productive 
capacity,  or  readiness,  are  obtained.  Simply  put,  the  ability  to  predict  job  performance  can  help 
an  organization  to  make  optimal  U;  s  of  its  personnel  resources. 

2.3  Air  Force  Measurement  of  Job  Performance. 

Prior  to  reviewing  relevant  job  performance  literature,  job  performance  models  must  be 
couched  in  terms  of  the  previous  modeling  discu.ssion.  Figure  5  illustrates  a  mathematical  job 
performance  model  using  the  same  type  of  graphical  representation  shown  previously.  In  model¬ 
ing  job  performance,  the  system  is  in  essence,  a  typical  worker  (in  the  current  re.search,  a  typical 
airman).  The  inputs  are  the  many  factors  known  to  influence  a  worl.er's  job  performance.  The 
output,  or  response  is  the  worker's  actual  job  performance.  The  system  (worker)  is  modeled  as  a 
mathematical  function.  In  the  mathematical  model  of  job  performance,  the  inputs  ,are  known  lev<ds 
of  selected  predictors.  The  output,  or  estimated  response,  is  an  estimate  of  some  nie:isure  of  job 
performance. 
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Figure  5.  Graphical  Representation  of  a  Job  Performance  Model 

It  is  important  to  note  that  to  build  a  mathematical  job  performance  model,  or  any  math¬ 
ematical  model  for  that  matter,  sound  measures  of  tlie  response  must  be  obtained.  It  should  be 
obvious  that  for  a  job  performance  model,  job  performance  is  the  response  and  job  performance 
measures  must  be  obtained.  It  will  be  shown  that  the  development  and  collection  of  valid  and  reli¬ 
able  job  performance  measures  can  be  a  very  involved  process.  Prior  to  discussing  the  Air  Force's 
development  of  job  performance  measures,  the  next  section  provides  an  expanded  definition  of  a 
job  performance  measure,  and  definitions  of  other  key  terms  in  job  performance  measurement. 

2.3.1  DefiniUons. 

2.3. 1.1  Job  PerJoTTnunct  Mea.‘iure.  A  job  prrjorrrinnre  measure  is  a  criterion  used 
to  assess  the  quality  or  amount  of  work  completed. 

The  term  job  performance  measure  generally  refers  to  the  formal,  valid  measurement  criteria 
used  by  individuals  who  have  a  professional  interest  in  as,sessing  and  quantifying  work  performed  on 
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a  particular  job.  Job  performance  measures  are  generally  not  associated  with  informal  subjective 
assessments  or  opinions  concerning  work  completed. 

Although  job  performance  measures  can  theoretically  be  used  to  evaluate  work  done  by  indi¬ 
vidual  workers,  groups,  or  machines,  they  are  usually  applied  to  individual  workers. 

There  are  numerous  possible  schemes  for  classifying  job  performance  measures.  An  important 
scheme  that  will  be  considered  in  this  thesis  classifies  measures  as  quality-based  or  quantity-based 
job  performance  measures. 

2. 3. 1.2  Quality-Based  Measures.  Quality-based  job  performance  measures  are 

those  measures  which  reflect  houi  well  work  is  accomplished.  Quality-based  measures  include  such 
things  as  subjective  ratings  of  the  quality  of  work,  or  the  percentage  of  steps  performed  correctly 
while  completing  a  task. 

2.3. 1.3  Quantity-Based  Measures.  In  contrast  to  quality-based  measures,  quantity- 
based  measures  of  job  performance  are  those  that  are  concerned  with  how  much  work  is  accom¬ 
plished,  referenced  to  time.  Some  examples  of  quantity-based  measures  include  the  number  of  parts 
made  per  hour,  or  just  the  time  it  takes  to  complete  a  job  task.  It  is  generally  the  case  for  quantity- 
based  measures  that  more  is  better.  In  other  words,  shorter  work  completion  times  are  desirable. 
Shorter  work  completion  times  equate  to  higher  worker  output  rates.  This  is  obviously  desirable 
for  an  organization  provided  the  greater  worker  output  is  not  at  the  expense  of  the  worker. 

The  distinction  between  quality-based  and  quantity-based  measures  is  important  to  thi  Air 
Force.  This  is  because  although  the  Air  Force  is  interested  in  both  quantity  and  quality,  quantity- 
bfised  measures  seem  to  be  more  frequently  used  in  most  Air  Force  force  manpower  modeling  and 
planning.  The  reason  is  that  overall  work  output  is  usually  the  object  of  interest  in  planning  and 
modeling  exercises.  For  instance,  the  Air  Force  currently  focuses  on  such  readiness  measures  as 
sortie  generation  rates  and  mean  time  to  repair  aircraft.  Only  quantity-based  job  performance 
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measures  contain  tlie  work  output  information  needed  to  assess  such  readiness  measures.  But  the 
Air  Force  realizes  that  quantity,  without  quality  considerations,  is  not  sufficient  or  desirable.  Thus, 
there  usually  is  a  simultaneous  interest  in  quantity  and  quality.  This  is  especially  true  of  the  Air 
Force  and  most  production-oriented  organizations. 

The  Air  Force’s  most  recent  job  performance  measurement  research,  under  the  Productive 
Capacity  Project,  has  focused  on  quantity-based  measures  with  an  attempt  to  build  into  these 
measures,  at  least  a  minimum  acceptable  quality  consideration. 

2.3.1. 4  QuanUiy/Qualtly  Tradeoff.  There  is  a  commonly  accepted  notion  of  a 
qxianiiiy/ quality  tradeoff  when  performing  work.  The  notion  is  that,  all  things  being  equal,  the 
quality  of  work  will  decrease  as  the  available  time  to  put  into  the  work  decreases.  Or  similarly,  the 
amount  of  time  to  complete  a  piece  of  work  will  increase  as  the  attention  given  to  quality  of  the 
work  increases.  It  is  believed  that  quantity  and  quality  are  directly  related. 

This  tradeoff  suggests  that  it  is  possible  to  allow  too  little  time  to  complete  a  piece  of  work  such 
that  the  quality  of  the  work  would  be  too  low,  or  unacceptable.  Or  similarly,  inordinate  attention 
to  work  quality  can  increase  the  work  completion  time  making  it  too  long,  or  unacceptable.  It 
is  desirable,  then,  to  somehow  account  for  quantity  when  collecting  quality-based  measures,  and 
quality  when  collecting  quantity-based  measures.  This  is  to  ensure  at  least  minimum  acceptability. 

As  previously  mentioned,  the  Air  Force  in  its  Productive  Capacity  Project  has  attempted 
to  build  a  minimum  acceptable  quality  standard  into  its  quantity-based  measures.  This  is  ac¬ 
complished  by  phrasing  the  job  performance  measurement  question  as,  “How  long  does  it  take 
to  complete  a  piece  of  work  while  ensuring  some  acceptable  level  of  quality?'"  (The  actual  data 
collection  format  and  instrument  will  be  discus-sed  in  Section  2. 3. 2. 5.)  For  quality-based  measures, 
quantity  can  likewise  be  accounted  for  by  asking  a  measurement  question  like,  “How  well  can  the 
work  be  completed  in  some  acceptable  (or  fixed)  amount  of  time?" 


The  built-in  quality  considerations  are  the  Air  Force's  current  method  for  addressing  the 
quantity/quality  tradeoff  when  collecting  its  quantity-based  measures. 

2. 3. 1.5  Productive  Capacity.  Productive  capacity  (PC)  is  a  quantity-based  job 
performance  measure  that  represents  the  maximum  amount  of  work  output  a  given  person  is 
capable  of  producing  on  a  particular  job  or  task  (21). 

Productive  capacity  is  to  be  distinguished  from  productivity.  Productivity  generally  refers  to 
how  much  output  people  typically  yield  on  a  normal,  day  to  day  basis.  Productive  capacity  on  the 
other  hand,  represents  the  amount  of  work  people  are  capable  of  producing  if  they  work  to  their 
full  potential. 

The  distinction  between  productive  capacity  and  productivity  is  important  when  attempting 
to  identify  the  factors  affecting  performance.  It  is  quite  possible,  if  not  likely,  that  factors  affecting 
productive  capacity  are  not  the  same  as  those  affecting  productivity.  Several  recent  studies  have 
supported  this  theory. 

The  distinction  between  productivity  and  productive  capacity  was  indirectly  addressed  in 
a  study  conducted  by  Sackett,  Zedcck  and  Fogli  (1988)  (34).  They  made  a  distinction  between 
typical  and  maximum  performance.  Typical  performance  generally  refers  to  average  or  long  term 
performance,  while  maximum  performance  refers  to  the  performance  resulting  when  maximum 
effort  is  given.  Sackett  and  others  found  low  correlation  between  typical  and  maximum  performance 
of  supermarket  check-out  clerks.  Their  findings  suggest  that  a  low  correlation  would  likely  exi.st 
between  productive  capacity,  arguably  a  measure  of  maximum  performance,  and  productivity,  more 
a  measure  of  typical  performance.  The  expected  low  correlation  implies  that  PC  and  productivity- 
are  measuring  different  aspects  of  job  performance,  and  would  likely  be  relat  ed  to  different  fact  ors. 

2.3.2  Background  Liiiratun  on  Job  Performance  Measurement  in  the  Air  Force.  As 
mentioned,  several  operational,  practical  and  congressionally-mandated  requirements  initially  gave 
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the  Air  Force  motivation  to  rather  aggressively  pursue  joh  performance  measurement  research. 
Recently,  it  has  been  the  Air  Force’s  desire  to  develop  job  performance  models  that  has  perpetuated 
the  motivation  and  research.  As  previously  discussed,  measures  of  job  performance  are  required  in 
the  development  of  mathematical  job  performance  models.  Following  is  review  of  the  Air  Force’s 
research  efforts  to  develop  sound  measures  of  job  performance  for  meeting  requirements  and  for 
development  of  job  performance  models.  Two  primary  research  projects  are  reviewed,  the  JPM 
and  Productive  Capacity  Projects. 

2.5.2.1  The  Joint  Service  Job  Performance  Measurement  Project.  In  response  to 
the  1983  congressional  mandate  to  link  job  performance  and  enlistment  standards,  and  for  numerous 
operational  reasons,  the  armed  services  began  a  joint  research  and  development  project  in  the  early 
1980’s.  The  purpose  of  the  project  was  to  explore  valid  job  performance  appraisal  techniques. 
The  research  was  coordinated  across  the  armed  services  to  insure  a  common  direction  of  effort,  to 
avoid  duplication  of  effort,  and  to  facilitate  technology  transfer  between  the  services.  The  research 
project  is  known  as  the  joint-service  Job  Performance  Measurement/ Enlistment  Standards  Project, 
or  simply  the  Job  Performance  Measurement  (JPM)  Project. 

2.3.2.2  The  Air  Force's  Job  Performance  Measurement  System  Project.  As  part  of 
the  broader  JPM  Project,  the  Air  Force  began  its  similarly- named  Job  Performance  Measurement 
System  (JPMS)  Project  (16).  As  the  name  implies,  the  JPMS  Project's  primary  purpose  was  to 
develop  or  identify  a  job  performance  measurement  system  that  is  valid,  meaning  it  would  consist 
of  measures  that  accurately  reflect  how  well  a  job  is  performed.  As  expected,  this  proved  to  be  a 
challenging  task. 

The  Air  Force  developed  various  job  performance  measures  including  hands-on  performance 
tests,  interviews,  written  tests,  and  supervisor,  peer,  and  self  ratings  (3)  ( 15)  (16)  (23),  The  primary 
performance  measure  developed  under  the  JPMS  Project  was  the  Walk-Through  Performance  Test 
(WTPT)  consisting  of  a  hand.s-on  work  sample  test  and  an  interview  portion  (15), 


3-t 


The  JPMS  measures  were  eventually  applied  to  airmen  in  eight  Air  Force  Specialties  (AFSs)' 
between  1982  and  1987.  The  results  of  the  JPM  project  are  thoroughly  documented  by  Laue, 
Teachout,  and  Harville  (1992)  and  in  numerous  technical  papers  produced  by  the  Technical  Training 
Research  Division,  Human  Resources  Directorate,  Armstrong  Laboratory  (19). 

As  the  1980s  ended,  the  JPM  Project,  at  least  for  the  Air  Force,  drew  to  a  close.  How¬ 
ever,  there  were  no  plans  to  operationally  implement  the  JPM  measures  because  of  the  cost  and 
practicality  problems  addressed  in  the  next  section. 

2.S.S.S  Problems  With  the  Air  Force's  Job  Performance  Measurement  System  Mea¬ 
sures.  Despite  the  success  of  the  JPMS  Project  in  developing  sound  methods  for  measuring  job 
performance,  the  JPMS  measures  have  some  problems  which  limit  their  broader  use  in  manpower 
modeling.  For  instance,  consider  the  Walk-Through  Performance  Test  of  the  JPMS  Project.  De¬ 
spite  its  attractiveness  and  validity  as  a  work-sample  test,  it  is  very  expensive  and  time  consuming 
to  develop  and  administer.  This  is  because  of  a  high  degree  of  job  and  task  analysis  required,  and  be¬ 
cause  of  a  frequent  need  to  access  subject  matter  experts  (SMEs),  usually  senior  non-commissioned 
officers  (NCOs).  It  also  requires  travel  to  Air  Force  bases  for  access  to  examinees.  Further,  it  is 
intrusive  in  that  the  test  must  be  set  up  and  administered  in  the  actual  workplace.  Finally,  it 
requires  several  hours  of  the  examinees’  time,  which  means  they  must  be  absent  froni  their  daily 
duties.  These  factors  significantly  lower  the  utility  of  theineasure  for  any  kind  of  widespread  use. 

A  second  problem  with  the  JPMS  measures  is  that  they  are  in  a  form  that  is  not  very  useful 
for  manpower  planning  (21 -.3).  The  measures  are  quality-based,  generally  in  a  form  representing 
percent  correct  on  a  performance  test,  or  a  performance  rating  on  an  quality-anchored  rating  scale. 
Such  quality-based  measures  have  obscure  interpretations  in  manpower  decisions  which  require 
work  output  information  (21:3). 
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Another  research  effort  conducted  during  the  JPM  time  frame  introduced  another  job  per¬ 
formance  measure,  called  productive  capacity,  which  seemed  to  be  free  of  many  of  the  troubles  of 
the  JPM  measures.  A  discussion  of  this  initial  PC  research  follows, 

2.S.2.4  Initial  Productive  Capacity  Research.  Carpenter,  Monaco,  O’Mara  and 
Teachout  (1989)  conducted  research  for  the  Air  Force  during  the  same  time  period  as  the  JPM 
Project,  to  explore  the  feasibility  and  utility  of  a  novel  job  performance  measure  called  productive 
capacity,  or  PC  (5).  As  defined  earlier,  PC  is  a  quantity-based  job  performance  measure  that 
represents  the  maximum  amount  of  work  output  a  given  person  is  capable  of  producing  on  a 
particular  job. 

Carpenter  and  others  mathematically  defined  productive  capacity  as  t' /t,  where  t’  is  a  stan¬ 
dard,  representing  the  fittest  possible  time  in  which  a  given  piece  of  work  can  be  completed.  Also, 
(  represents  the  time,  on  average,  it  takes  the  individual  under  assessment  to  complete  the  work. 

The  researchers  investigated  whether  PC  ratings  could  be  effectively  provided  by  Air  Force 
supervisors.  Their  research  involved  personnel  in  career  field  328X0,  Avionics  Communications. 

Prior  to  collecting  data  on  experimental  subjects,  benchmark  times  were  assigned  to  clusters 
of  tasks  representative  of  the  job,  by  subject  matter  experts.  The  benchmarks  represented  SME 

j 

estimates  of  the  average  amount  of  time  it  would  take  a  first-term  airmen  to  complete  the  task 
cluster.  The  benchmarks  were  then  provided  to  Air  Force  supervisors  who  used  them  to  estimate 
work  completion  times  for  their  personnel. 

The  PC  data  collection  went  as  follows.  Supervisors  selected  one  of  their  workers  whom  they 
believed  worked  closest  to  the  benchmark  pace.  They  then  estimated  how  long  it  would  lake  each  of 
their  other  workers  to  complete  the  same  amount  of  work  that  the  benchmark  worker  could  perform 
in  one  hour.  This  was  done  for  each  task  cluster.  The  t*  values  were  obtained  by  subtracting  one 
minute  from  the  fastest  estimated  time  for  each  task  cluster. 
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To  validate  the  supervisor  estimate  technique,  the  researchers  collected  rr.ore  objective  per¬ 
formance  data  using  WTPT  methodology,  for  comparison.  Correlations  between  supervisor  ratings 
and  the  objective  measures  were  low  to  moderate. 

Overall,  the  research  indicated  the  supervisor  estimate  methodology  for  obtaining  productive 
capacity  data  had  promise.  This  is  true  especially  when  considering  the  cost  and  time-consuming 
nature  of  empirically  deriving  t'  and  t  values  by  actually  timing  airmen  while  they  perform  job- 
related  tasks.  Unfortunately,  the  study  indicated  that  more  development  of  the  productive  capacity 
measure  wjis  needed. 

This  research  had  several  associated  problems.  The  first  problem  was  the  use  of  a  benchmark 
worker  as  a  basis  for  comparison  when  supervisors  made  their  lime  estimates.  Because  supervisors 
selected  unique  benchmark  workers  (from  among  their  own  subordinates),  there  was  to  some  degree, 
a  floating  reference  point  between  supervisors  when  estimating  performance  times.  This  may  have 
introduced  bias  into  the  ratings. 

Second,  th*  PC  measures  were  computed  from  time  estimates  that  were  reflective  of  an 
individual’s  performance  on  average.  The  PC  measures  derived  from  these  times  do  not  reflect  true 
productive  capacity,  but  average  productivity.  This  deviates  from  the  definition  of  PC  as  previously 
expressed. 

A  third  problem  was  that  only  a  single  benchmark  time  was  used  by  supervisors  when  selecting 
their  benchmark  worker,  and  indirectly  when  making  their  time  estimates.  The  single  benchmark 
represented  the  average  amount  of  time  it  takes  a  first  term  airman  to  complete  work.  The  problem 
with  a  single  benchmark  is  that  it  says  nothing  about  the  variance  and  distribution  of  performance 
times.  This  paints  an  incomplete  picture  of  the  range  of  performance  times  that  might  be  expected 
across  individuals.  Supervisors  probably  used  their  own  assessments  of  what  the  distribution  of 
performance  times  was  like  and  further  biased  the  ratings. 
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A  fourth  problem  was  that  the  study  looked  at  only  one  job.  It  is  difficult  to  comment  on 
the  utility  of  the  PC  measure  for  widespread  use  without  looking  at  its  performance  in  a  number 
of  AFSs. 

2. 3.2. 5  The  Productive  Capacity  Project.  As  the  JPMS  Project  and  initial  PC 
research  drew  to  a  close,  the  Air  Force  recognized  that  many  operational  and  modt'ing  needs  for 
valid  job  performance  measures  would  remain  unsatisfied.  The  JPMS  measures  were  useful  in 
fulfilling  the  congressional  mandate,  and  the  initial  PC  research  demonstrated  the  potential  of  a 
new  measurement  technique.  But  neither  effort  provided  a  valid  and  efficient  measure  suitable  for 
broader  use  in  addressing  operational  concerns  and  in  development  of  job  performance  n  odels.  The 
Air  Force  realized  it  must  conduct  further  research  to  develop  a  measure  that  could  better  satisfy^ 
its  needs. 

The  Air  Force  reviewed  its  performance  measurement  research  and  determined  tha-  it  would 
pursue  the  development  of  the  PC  measure  over  any  of  the  JPMS  measures.  This  is  because  PC 
offers  the  most  overall  promise.  The  PC  measure  seems  to  counter  the  problems  associate!  with 
the  JPMS  measures  in  that  it  is  relatively  inexpensive  to  implement,  it  is  quantity-based,  and  thus 
can  be  meaningful  when  making  manpower  decisions.  Also,  the  PC  measure  as  originally  defined 
seemed  to  leave  room  for  significant  improvement. 

As  a  result,  the  Air  Force  began  its  Productive  Capacity  Project,  with  the  goal  of  improving 
the  PC  measure  so  that  it  could  be  used  to  address  operational  concerns  and  to  serve  cis  basis  in 
manpower  modeling. 

The  first  effort  of  the  Productive  Capacity  Project  was  an  attemiit  to  address  the  problems 
Jissociated  w'ith  the  initial  PC  measure  (21).  Instead  of  having  supervisors  use  a  benchmark  worker 
as  a  reference  when  estimating  performance  times,  the  researchers  had  them  use  time-anchored 
rating  scales  derived  from  subject  matter  experts  as  the  reference.  Next,  supervisors  were  not 
asked  to  estimate  individuals’  typical  or  average  performance  times,  but  their  fastest  possible 
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performance  times.  And,  as  an  alternative  to  providing  supervisors  witii  only  a  single  benchmark 
reference  time,  the  time-anchored  rating  scales  used  had  multiple  benchmarks  per  individual  task. 
The  benchmarks  represented  estimates  of  the  fastest  time  in  which  the  task  could  possibly  be 
completed,  the  average  time  it  would  take  a  first  term  airman  to  complete  ihe  task,  and  the  longest 
time  that  an  airman  would  be  allowed  to  work  on  the  task  without  negative  consequences  to  the 
job.  Last,  the  researchers  studied  four  Air  Force  jobs  to  provide  a  broader  view  of  the  PC  measure’s 
effectiveness. 

The  reader  is  referred  to  Measurement  of  Productive  Capacity:  A  Methodology  for  Air  Force 
Specialties,  for  a  complete  description  of  this  PC  research  (21).  Because  the  PC  data  collected  by 
Leighton  and  others  for  AFS  454X1  will  be  used  for  the  analyses  in  this  thesis,  following  is  a  fairly 
detailed  overview  of  the  research. 

An  early  issue  for  the  Leighton  and  others  was  the  selection  of  jobs  to  be  studied.  The  first 
job  selection  consideration  was  the  aptitude  category  into  which  jobs  are  classified.  The  Air  Force 
uses  a  10-subtest  paper-and-pencil  test  called  the  Armed  Services  Vocational  Battery  (A.'VAB)  to 
select  recruits  for  service,  and  then  to  place  them  into  jobs.  Air  Force  jobs  can  be  classified  into 
four  categories  corresponding  to  four  ASVAB  composite  scores  The  job  type^  and  corresponding 
composite  scores  are  Mechanical  (M),  Administrative  (A),  General  (G),  and  Electronic  (E).  The 
composites  are  referred  to  as  aptitude  indices  (AIs),  and  theoretically  measure  aptitude  in  their 
named  area.  Each  Air  Force  job  is  associated  with  at  least  one  Al,  by  the  nature  of  the  work 
performed  in  the  job.  There  are  minimum  AI  cutoff  scores  that  individuals  must  exceed  to  enter 
the  various  job  types  (8). 

To  assess  the  utility  and  validity  of  the  PC  measure  across  a  variety  of  jobs,  the  researchers 
opted  to  select  one  job  from  each  aptitude  area  for  the  study.  They  also  chose  to  select  from  among 
the  eight  jobs  analyzed  under  the  JPMS  Project.  This  was  to  take  advantage  of  tlie  extensive  task 
analysis  information  previously  compiled.  Also,  the  four  jobs  studied  latest  in  the  JPMS  Project 
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Table  2.  Air  Force  Specialties  Selected  for  the  Initial  Study  of  the  Productive  Capacity  Project 


Specialty  Code 

Specialty  Name 

ASVAB  Aptitude  Index 

122X0 

Aircrew  Life  Support 

General  (G) 

454X1 

Aerospace  Ground  Equipment 

Mechanical  (M) 

455X2 

Avionic  Communications  and  Navigation  Systems 

Electronic  (E) 

732X0 

Personnel 

Administrative  (A) 

Table  3.  Number  of  Tasks  Selected  for  the  Initial  Study  of  the  Productive  Capacity  Project 


Specialty  Code 

Number  of  Tasks 

122X0 

45 

454X1 

50 

455X2 

41 

732X0  ■ 

36 

were  given  preference  because  written  job  knowledge  tests  were  created  for  them  (3).  These  JKTs 
were  identified  as  potentially  useful  measures  for  PC  validation. 


A  l2isl  consideration  in  job  selection  was  the  availability  of  airmen  to  serve  ats  experimental 
subjects.  After  consideration  of  all  factors,  the  four  jobs  listed  in  Table  2  were  selected. 


(Under  the  JPMS  Project,  455X2  appeared  as  328X0,  Avionic  Communications.  The  455X2  title 
reflects  the  combination  of  AFSs  328X0,  328X1,  and  328X4.  Similarly,  454X1  appeared  as  423X5.) 

After  selecting  the  jobs,  the  analysts  were  faced  with  the  issue  of  selecting  which  tasks  from 
within  the  jobs  would  be  studied.  As  with  the  JPMS  Project’s  Walk-Through  Performance  Test, 
the  task  level  was  chosen  as  the  appropriate  level  of  job  detail  for  collecting  the  PC  data. 

Tasks  from  the  WTPT  were  highly  desirable  candidates  for  the  PC  research  because  they 
were  very  well  articulated  and  broken  down  into  great  detail  as  part  of  the  WTPT  development. 
Unfortunately,  there  were  not  enough  WTPT  tasks  generalizable  to  all  positions  within  a  given 
AFS  to  provide  an  overall  view  of  an  individuar.s  PC.  The  researchers  subsequently  selected  addi¬ 
tional  tasks  from  task  inventory  data  collected  by  the  Occupational  Measurement  Squadron  (OMS). 
Randolph  AFB,  TX.  The  final  numbers  of  ta.sks  are  listed  in  Table  3. 


Table  4.  454X1  Job  Duty  Areas 


Duty 

Area 

Description 

A 

Organizing  and  Planning 

B 

Directing  and  Implementing 

C 

Inspecting  and  Evrduating 

D 

Training 

E 

Performing  General  Administrative  Tasks 

F 

Performing  Preoperations  or  Service  Inspections 

G 

Performing  Periodic  Inspections 

H 

Maintaining  AGE  Electrical  or  Electronic  Systems 

1 

Maintaining  AGE  Engines,  Motors,  or  Generators 

J 

Maintaining  AGE  Heating  Systems 

K 

Maintaining  AGE  Refrigeration  Systems 
or  Equipment  Coolers 

L 

Maintaining  AGE  Test  Stand,  Bomblift,  or  General 
Servicing  .Hydraulic  Systems 

Maintaining  AGE  Pneumatic  Systems 

M 

N 

Maintaining  AGE  Enclosures,  Chassis,  or  Drives 

0 

Maintaining  Mobile  Tactical  Air  Control  Systems 
Equipment 

P 

Dispatching  AGE 

Q 

Maintaining  Special  Tools  or  Shop  Equipment 

R 

Performing  Quality  Assurance  Tasks 

S 

Performing  Nonpowered  .AGE  Maintenance 

T 

Performing  Cross-Utilization  Tasks 

Job  tasks  are  typically  coded  by  OMS  (40).  Task  codes  consist  of  a  letter  prefix  and  a 
numeric  suffix.  The  letter  prefix  identifies  which  job  duty  area  the  task  is  from,  and  the  numeric 
suffix  differentiates  tasks  within  the  duty  areas.  Because  data  for  AFS  454X1  were  ani.lyzed  in  this 
thesis,  Table  4  which  lists  the  454X1  job  duty  areas  and  Table  22  at  Appendix  A  which  lists  and 
describes  the  50  454X1  tasks  analyzed  were  included  (40). 


The  task  descriptions  in  Table  22  at  Appendix  A  do  not  exactly  match  the  descriptions 
maintained  by  OMS.  The  task  descriptions  had  to  be  modified  for  the  Productive  Capacity  Project 
to  clearly  define  a  task  by  specifying  exact  equipment  and  precise  starting  and  stopping  points  so 
that  accurate  completion  time  estimates  could  be  made. 

After  task  selection,  the  researchers  had  to  estahli.sh  benchmark  times  for  the  tasks.  The 
benchmarks  were  needed  for  the  creation  of  the  rating  scales  to  be  used  by  the  supervisors  in 


\ 
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estimating  the  work  completion  times  of  their  subordinates.  Three  benchmarks  were  derived  for 
each  task.  These  represented  the  feistest  time  in  which  the  task  could  be  completed,  the  average 
time  it  takes  a  first  term  airman  to  complete  the  task,  and  the  longest  time  that  an  airman  would 
be  allowed  to  work  on  the  task  without  significant  consequences  to  the  job. 

To  get  these  benchmarks,  si.x  SMEs  from  each  job  were  assembled  for  workshops  at  Brooks 
AFB,  TX.  The  workshops  for  each  job  were  held  separately.  During  the  workshops,  the  SMEs  were 
presented  the  task  I'sts  corresponding  to  their  given  jobs.  The  Nominal  Group  Technique  (NGT) 
was  used  to  reach  consensus  among  the  SMEs  for  each  benchmark  for  each  task  (14). 

A  detailed  analysis  of  the  interrater  agreement  of  the  SMEs  when  providing  the  benchmarks 
was  accomplished  by  Skinner,  Faneuff,  and  Demetriades  (1991)  (39).  Overall,  they  found  that  there 
tends  to  be  very  strong  agreement  among  SMEs  when  estimating  the  benchmarks. 

To  gain  access  to  supervisors  and  airmen  to  serve  as  experimental  subjects,  it  Wcis  necessary 
for  the  researchers  to  visit  a  number  of  Air  Force  bases.  The  primary  considerations  in  selecting 
the  Air  Force  bases  included  the  following; 

•  The  number  of  potential  subjects  available  at  each  base 

•  Base  location  (Continental  U.S  or  overseas) 

•  Base  mission  (training,  classified,  etc.) 

The  researchers  determined  that  10  bases  would  be  visited.  The  bases  are  listed  in  Table  5. 

A  sample  size  of  200  airmen  per  AFS  was  targeted.  This  was  the  maximum  number  that 
could  be  tested  given  project  resources.  Also,  a  sample  size  of  200  was  considered  sufficient  to 
support  planned  analyses.  Subjects  for  each  AFS  were  selected  to  be  rejiresentative  of  the  base 
populations  in  terms  of  three  factors: 

•  Job  experience 
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Table  5.  Bases  Visited  in  the  Initial  Study  of  the  Productive  Capacity  Project 


Air  Force  Base _ 

Travis  AFB,  CA 
Beale  AFB,  CA 
George  AFB,  CA 
Davis-Monthan  AFB,  AZ 
Holloman  AFB,  NM 
Langley  AFB,  VA 
Shaw  AFB,  SC 
Offutt  AFB,  NE 
Eglin  AFL,  FL 
Charleston  AFB,  SC 


•  Race 


•  Gender 


Job  experience  was  considered  a  very  important  factor  because  of  its  hypothesized  statistical 
relationship  with  PC.  Because  of  the  hypothesized  relationship,  an  attempt  was  made  to  get  subjects 
across  a  range  of  experience.  This  would  allow  the  hypothesis  to  be  appropriately  tested. 

Experience  was  expressed  in  terms  of  skill  level.  Skill  level  is  a  variable  used  by  the  Air 
Force.  It  ranges  from  0  to  9,  and  it  represents  the  amount  of  training,  expertise,  and  experience 
an  airman  has  on  a  given  job.  Skill  levels  3  and  5  were  sought  because  they  indicate  that  an 
airman  is  performing  mostly  hands-on  production  work,  as  opposed  to  receiving  technical  training 
or  performing  supervisory  duties.  Race  and  gender  factors  were  considered  important  to  allow  for 
future  investigation  of  differential  effects  of  the  PC  measure  across  race  and  gender  groups. 

The  researchers  reviewed  distributions  of  personnel  at  the  participating  bases  and  develo|)ed 
target  numbers  of  subjects,  The  actual  individual  test  subjects  were  selected  by  the  participating 
bases,  using  guidance  from  the  researchers.  The  ba.ses  had  to  select  the  subjects  because  they  had 
the  most  current  information  on  manning  re»iuirements,  deployments,  and  personnel  status.  One 
prol>lein  with  having  the  bases  select  the  subjects,  was  that  no  consideration  could  be  given  to 
subject  aptitude  level.  This  is  because  ASVAB  .scores  were  not  available  in  base-level  personnel 
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Table  6.  Sample  Sizes  for  the  Initial  Study  of  the  Productive  Capacity  Project 


Specialty  Code 

Number  of  Subjects 

159 

204 

455X2 

155 

732X0 

193 

records.  Like  experience,  aptitude  is  expected  to  be  related  to  PC  and  it  would  have  been  desirable 
to  sample  subjects  from  a  range  of  aptitude  levels.  The  final  sample  sizes  are  listed  in  Table  6. 

The  primary  focus  of  Leighton  and  others’  research  was  to  collect  appropriate  data  to  allow 
them  to  assess  how  well  supervisors  can  estimate  the  task  completion  times  of  their  subordinates. 

This  means  that  the  primary  measurement  instrument  of  the  study  was  the  time  estimation 
forms  and  accompanying  booklets  used  by  the  supervisors  to  estimate  how  long  it  would  take  their 
subordinates  to  complete  the  tasks  being  studied  (24).  The  rating  forms  and  booklets  provided 
the  supervisors  with  detailed  task  descriptions,  and  a  time  line  showing  the  fast,  normal,  and  slow 
times  for  task  completion. 

It  was  on  the  estimation  forms  that  the  supervisors  provided  the  task  completion  time  es¬ 
timates,  as  well  2is  an  indicatioi.  of  how  frequently  they  have  seen  the  ratee  complete  the  task 
(Regularly,  Often,  Never).  In  making  their  time  estimates,  supervisors  were  told  to  “think  about 
how  long  it  would  take  each  airman  to  do  the  task  if  he  or  she  were  working  as  quickly  as  they 
could,  while  maintaining  satisfactory  performance”  (21:521. 

In  addition  to  using  the  forms  to  estimate  task  performance  times,  the  supervisors  used  them 
to  provide  an  overall  or  global  estimate  of  their  subordinates’  productive  capacity.  The  supervisors 
were  asked  to  answer  the  following  question:  “In  this  specialty,  consider  the  maximum  amount  of 
acceptable  work  that  ran  he  done  by  a  person  in  a  typical  day  as  100  percent.  U'hat  percent  of  the 
maximum  could  the  person  you  are  currently  rating  do  in  a  typical  day?”  (21 :52)  This  measure  was 
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of  secondary  interest,  and  was  collected  for  use  as  an  object  of  comparison  for  the  time  estimates. 
Fip.ure  6  provides  an  example  of  the  time  estimation  form  for  454X1. 

Besides  the  PC  rating  forms,  many  other  instruments  were  used  during  the  study.  These 
instruments  are  to  be  used  to  validate  the  PC  estimation  methodology,  and  to  investigate  for 
relationships  between  scores  from  these  instruments  and  PC.  Other  data  forms  were  used  to  collect 
background  information  on  the  experimental  subjects  and  their  rating  supervisors. 


The  main  instrument  for  validating  the  PC  estimation  methodology  is  a  hands-on  test  similar 
to  the  hands-on  portion  of  the  VVTPT  developed  under  the  Air  Force’s  JPMS  Project.  For  the 
test,  a  relatively  small  subset  of  tasks  was  chosen  from  each  job  (between  8  and  11).  A  subsample 
of  the  experimental  subjects  were  then  chosen  to  actually  perform  the  tasks  (60  airmen  from  each 
AFS).  As  the  subjects  performed  the  tasks,  the  researchers  used  a  stopwatch  to  determine  their 
performance  times.  This  was  determined  to  be  the  best  possible  way  to  validate  the  supervisor 
estimates.  Also,  JKTs  were  administered  to  subjects  in  three  of  the  four  jobs  studied  (none  was 
available  for  455X2).  JKTs  are  written  tasked-based,  multiple-choice  tests  designed  to  measure 
how  well  an  airman  knows  the  procedures  required  to  perform  job  tasks  (3).  The  JKTs  are  to 
serve  as  a  basis  of  comparison  in  which  to  evaluate  the  PC  measure.  In  previous  studies,  corrected 


correlations  between  the  hands-on  portion  of  the  WTPT  and  JKTs  were  found  to  be  between  .50 

to  .80  indicating  a  moderate  to  high  level  of  linear  relationship  (19:11).  Since  the  estimated  PC 

1 

measure  in  the  current  study  and  the  JKT  are  both  purported  to  measur'p  job  performance,  it  was 


expected  that  these  measures  would  be  correlated  to  some  degree  as  w^ll.  High  correlation  wa.s 
not  expected  because  the  instruments  likely  measure  different  dimensions  of  performance  since  the 
JKT  deals  with  how  well  an  individual  knows  the  job,  and  PC  deals  with  how  long  it  takes  an 
individual  to  do  work  on  the  job. 

Other  measures  that  were  administered  include  a  160-item  interest  inventory,  the  I'Ocational 


/nterest  For  Career  Enhancement  (VOICE),  which  was  administered  to  subjects  to  determine  their 
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AFS  454X1 


AEROSPACE  GROUND  EQUIPMENT  (AGE)  SPECIALIST 


Name _ 

Auman'sNime  _  Airman's  SSN _ 

Hr-Hours 
Min-Minules 
Sec-Scconds 


R-ReguUriy 

O-Occassionally 

N-Never 


How  Often  You 
Observe  Incumbent 
Perform  Task 
(Check  One  Box) 


Consensus 

Perfomuiice  Time  Scale 


What  is  Incumbent’s 
Performance 

Time? 


Fastest  Normal  Slowest  ^ 


□ 

0  min  ^ 

13  mm 

—  21  min  ^  1 

LEj 

Ma 

$«e 

□ 

1 - 1 

1 1 

“  17  min - 

24  min 

Msn 

In  this  specially,  consider  the  maximum  amount  of  acceptable  work  that  can  be 
done  by  a  person  in  a  typical  day  as  100  percent.  What  percent  of  maximum 
could  the  person  you  are  currently  rating  do  in  a  typical  day?  Write  your 
estimate  in  the  box  below. 


1% 


10091 


Percent  of 
Maximum 


Figure  6.  Example  of  the  4-54X1  Rating  Form 
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level  of  interest  in  their  current  job  (9)  (12).  A  30-item  motivation  measure,  the  Generalized  Moti¬ 
vation  Scale  (GMS),  was  also  administered  to  the  subjects  (32)  (33).  The  GMS  was  administered 
to  allow  for  investigation  of  any  relation  between  overall  motivation  and  PC. 

Numerous  other  data  collection  forms  were  used  to  gather  background  information  on  both 
subjects  and  supervisors. 

The  project  sponsor,  AL/HHM,-  has  assembled  a  rich  data  base  made  up  of  individual  sub¬ 
ject  records  containing  the  project  measures  described  above.  Adding  to  its  value,  data  from  other 
important  Air  Force  files  have  been  added  to  ensure  a  complete  background  on  the  e.xperimen- 
tal  subjects.  Data  added  from  the  Uniform  Airmen  Record  (UAR),  a  periodically-updated  file 
maintained  by  the  Air  Force  Military  Personnel  Center  (AFMPC),  included  education  level,  race, 
ethnicity,  and  the  date  in  which  the  subject  began  active  service.  Data  from  Military  Entrance 
Processing  Station  (MEPS)  files  included  aptitude  scores  and  other  background  information  for 
cross-checking  purposes. 

2.^  The  Predictors  of  Job  Performance. 

In  the  previous  section,  significant  discussion  concerning  the  job  performance  model  response, 
job  performance,  was  provided.  Next,  discussion  focuses  on  the  predictor  variables.  Recall  that  the 
predictor  variables  to  be  used  in  this  thesis  are  aptitude  and  experience. 

Numerous  factors  are  thought  to  influence  the  job  performance  of  individuals.  These  include 
personality  traits,  job  satisfaction,  job  interest,  aptitude,  and  experience,  to  name  just  a  few. 
Psychological  research  is  filled  with  studies  showing  the  effects  of  such  factors  on  performance. 

It  is  important  to  note  that  the  job  performance  measure  under  study  in  this  tln.-sis  is  produc¬ 
tive  capacity,  which  is  distinguished  from  productivity.  Many  individual  altribiitos  that  influence 
productivity  like  job  interest,  motivation,  and  other  personality  factors  were  not  expected  to  influ¬ 
ence  productive  capacity  because  PG  is  a  measure  of  a  person's  co;mc;/y  to  produce  not  their  actual 
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Table  7.  ASVAB  Subtests 


Subtest  Name 

No.  of  Items 

Testing  Time  (Min.) 

General  Science  (GS) 

25 

11 

Arithmetic  Reasoning  (AR) 

30 

36 

Word  Knowledge  (WK^ 

35 

11 

Paragraph  Comprehension  (PC) 

15 

13 

Numerical  Operations  (NO) 

50 

3 

Coding  Speed  (CS) 

84 

7 

Auto-Shop  Information  (AS) 

25 

11 

Mathematics  Knowledge  (MK) 

25 

24 

Mechanical  Comprehension  (MC) 

25 

19 

Electronics  Information  (El) 

20 

9 

produciion.  PC  is  theoretically  independent  of  how  a  person  views  the  work  or  feels  about  the  job. 
PC  however  was  not  believed  to  be  independent  of  such  things  as  a  person’s  mental  aptitude  or 
job  experience  since  these  likely  influence  a  person’s  capacity  to  produce.  Because  of  the  hypothe¬ 
sized  relationships  between  aptitude,  experience  and  productive  capacity,  the  Air  Force’s  emphasis 
has  been  on  aptitude  and  experience  as  predictors  of  PC  (5)  (13).  This  thesis  continued  with  the 
analysis  of  aptitude  and  experience  as  predictors. 

In  Air  Force  studies,  aptitude  is  usually  expressed  in  terms  of  scores  on  the  ASVAB.  As 
previously  mentioned,  the  ASVAB  is  a  10-subtest,  paper-and-pencil  test  given  to  all  armed  service 
and  Coast  Guard  applicants  (10).  The  test  is  designed  to  measure  aptitude  in  various  areas.  The 
applicants’  ASVAB  scores  determine  whether  or  not  they  are  selected  for  service,  and  if  so,  what 
type  of  job  they  are  classified  into  (10). 


The  Air  Force  uses  five  ASV.4B  composite  scores  to  select  and  classify  applicants  and  recruits. 
Table  7  and  Table  S  show  the  ASVAB  subtests  and  composites,  respectively,  used  by  the  Air  Force. 


The  ASVAB  is  validated  against  a  number  of  criteria  by  each  of  the  services.  The  Air  Force 
typically  u.ses  the  final  grades  .Mr  Force  recruits  receive  in  technical  training  schools  as  validation 
criteria.  For  instance,  Rce  and  Earles  (1992)  accomplished  an  .'tSV.-XB  validation  study  in  which 
they  analyzed  data  from  88,72d  Air  Force  recruits  completing  150  training  courses  (31 ).  For  22  jobs 
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Table  8.  ASVAB  Composites  Used  by  the  Air  Force 


Composite  Name 

Definition 

Armed  Forces  Qualification  Test  (AFQT) 
Verbal  (VE) 

Mechanical  (M) 

Administrative  (A) 

General  (G) 

Electronic  (E) 

MAGE 

2V  E  +  AR  +  M  K 
WK  +  PC 

MC  +  GS  +  2AS 

NO  +  es+  VE 

VE  +  AR 

A  R.  A/  A  E I  (j  S 
+  ^  + 

The  composites  ure  computed  using  subtest  sttmdard  scores. 


which  use  the  M  composite  for  selection,  the  corrected-for-range-restriction  correlation  coefficients 
between  the  M  composite  and  final  school  grades  ranged  between  .63  and  .78.  For  11  jobs  which 
use  the  A  composite,  the  correlation  coefficients  between  A  and  final  school  grades  ranged  from  .58 
to  .74.  For  52  jobs  using  G,  correlation  coefficients  ranged  from  .04  to  .85.  And  for  44  jobs  using  E, 
the  correlation  coefficients  ranged  from  .56  to  .90  (31:11-13).  These  moderate  to  high  correlation 
coefficients  tend  to  indicate  the  ASVAB  is  valid,  at  least  for  predicting  training  school  success. 

This  has  long  been  the  Air  Force’s  method  of  choice  for  validating  the  ASVAB,  but  it  is  recog¬ 
nized  that  validating  the  ASVAB  against  training  grades  does  not  necessarily  equate  to  validating 
the  ASVAB  against  j^  b  performance.  But,  studies  by  Carpenter  and  others,  Faneuff  and  others, 
and  AL/HRM  indicate  that  ASVAB  scores  can  potentially  be  a  significant  predictor  of  PC,  a  job 
performance  measure  (5)  (7)  (13)  (38). 

Experience  measures  in  Air  Force  job  performance  RicD  are  usually  expressed  in  terms  of 
total  months  of  active  federal  military  service  (TAFMS).  This  is  generally  used  as  a  surrogate  for 
job  experience  because  job  experience  indicators  are  not  readily  obtainable  from  existing  computer 
files.  The  reason  job  experience  is  considered  important  as  a  predictor  can  be  traceable  to  learning 
curve  theory.  Learning  curve  theory  basically  states  that  the  time  it  takes  to  complete  a  unit  of 
work  will  decrease  as  the  operator  becomes  more  experienced  (41).  This  suggests  that  PC  will 
likewise  be  affected  by  job  experience  because  PC  is  computed  from  performance  time  data.  As  a 
result,  experience  is  an  important  predictor  in  PC  prediction  models. 
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2.5  The  Relationship  Between  Job  Performance,  Aptitude,  and  Experience. 

The  previous  sections  discussed  the  response,  job  performance,  and  the  predictors,  aptitude 
and  experience.  This  section  discusses  how  job  performance  has  been  shown  to  relate  to  the 
predictors  in  previous  modeling  efforts. 

To  analyze  the  effects  of  aptitude  and  experience  on  job  performance  Schmidt,  Hunter,  and 
Outerbridge  (1986)  performed  a  study  based  on  a  sample  of  1,474  civilian  and  military  personnel 
(35).  They  used  path  analysis  to  analyze  the  impact  of  job  experience  and  mental  ability  on 
job  performance.  The  measures  of  job  performance  used  were  written  job  knowledge  tests,  work 
sample  tests,  and  supervisory  ratings  of  job  performance.  Their  findings  suggest  that  job  experience 
affects  job  performance  in  two  ways.  First,  greater  job  experience  indirectly  effects  performance 
because  it  leads  to  greater  acquisition  of  job  knowledge.  The  greater  job  knowledge  leads  to  greater 
performance.  Second,  job  experience  directly  affects  the  ability  of  people  to  perform  work-related 
activities  as  indicated  by  work  sample  tests.  Mental  ability  was  found  to  have  the  same  pattern 
and  magnitude  of  relationships  on  job  knowledge  and  work  sample  performance  as  experience. 

Schmidt,  Hunter,  Outerbridge  and  Goff  (1988)  conducted  a  study  based  on  the  same  sample 
as  the  previously  cited  study,  to  analyze  the  joint  relation  of  experience  and  mental  ability  with  job 
performance  (36).  They  tested  three  hypotheses.  The  first,  the  divergence  hypothesis,  “predicts 
that  as  job  experience  increases,  the  performance  difference  between  high-  and  low-ability  employees 
will  increase."  (36:40)  Ihe  second,  the  convergence  hypothesis,  “proposes  that  as  employees  gain 
job  e.xperience,  initial  ability  becomes  less  important  as  a  determinant  of  job  performarce."  (36:40) 
Last,  the  iioniiileractive  hypothesis  states  “experience  increases  job  performance  of  high-  and  low- 
ability  employei’s  at  the  same  rate."  (36:47)  In  other  words,  the  third  hypothesis  states  .hat  there  is 
no  interaction  between  exp'''i''nce  and  ability.  Their  findings  support  the  noninteractive  hypothesis, 
and  that  mental  ability  and  experience  are  important  determinants  ol  job  performance. 
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In  a  siiTiWar  study,  Alley  and  Teachout  (1990)  used  the  WTPT  data  collected  during  the  JPMS 
Project  (1).  Like  Schmidt,  Hunter,  Outerbridge  and  Goff,  their  findings  support  the  noninteractive 
hypothesis  and  the  fact  that  mental  ability  and  experience  are  important  determinants  of  job 
performance. 

2.6  Air  Force  Job  Performance  Modeling  Research. 

The  preceding  sections  provided  an  overview  of  linear  models  and  discussions  of  the  response, 
PC,  and  the  predictors,  aptitude  and  experience.  The  stage  hais  thus  been  set  for  discussion  of 
specific  Air  Force  studies  in  which  the  response  variable  was  PC  or  raw  performance  times,  and 
the  predictors  were  aptitude  and  experience. 

To  model  PC,  Carpenter  and  others  used  the  logistic  growth  model  in  Equation  11  to  model 
PC  (5:21). 


where 

PC 

xi  = 

X2  = 

00 1  0\  -  02  = 

(  = 

Nolo  that  the  logistic  model  in  its  original  form  was  not  a  linear  mathematical  model  because 
it  was  not  linear  with  respect  to  the  0  parameters.  However,  the  logistic  model  was  linearized  for 
application  of  linear  regression. 
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productive  capacity 
experience  (months  in  Air  Force) 

ASVAB  aptitude  score  (Electronic  Composite) 
parameters  to  be  estimated 
the  model  error  terms. 


Carpenter  and  others  linearized  the  logistic  model  by  making  tlic  transformations  indicated 
in  Equation  12  (5:21-23).  Linearizing  the  model  equation  as  such  allowed  for  estimation  of  the 
model  parameters  using  least  squares  estimation. 


where 

Mt^) 

Xl 

00,01,02 

( 

Using  Equation  12,  Carpenter  and  others  modeled  PC  at  the  task  cluster  level,  and  also  at 
the  aggregate  or  overall  level.  The  aggregate  measure  predicted  was  computed  from  a  weighted 
average  of  task  cluster  performance  times.  They  analyzed  a  total  of  10  task  clusters.  Across  the 
10  task  clusters,  the  estimated  experience  coefficient  was  significantly  different  from  zero  at  the 
Q  =  .05  level  in  seven  cases,  and  the  estimated  aptitude  coefficient  was  significant  in  four  cases 
(5:22).  Model  R~s  ranged^from  .00  to  .39  across  the  clusters.  .\nd,  the  models  showed  significant 
regression  relations  in  eight  cases.  For  the  aggregate  model,  both  the  estimated  experience  and 
aptitude  coefficients  were  significant  at  the  q  =  .05  level.  The  aggregate  model  R'  was  .■44  and  the 
model  regression  relation  w^  significant  at  the  a  =  05.  Overall,  the  results  suggest  the  supervisor 
estimate  method  for  generating  individual  performance  times  has  potential.  But,  as  Carpenter  and 
others  point  out,  further  refitment  is  nc‘eded  (5:51) 

While  Carpenter  and  others  Jsed  the  logistic  model  for  predicting  PC.  Faneuff  and  others 
found  that  a  linear  model  provided  better  model  fit  than  did  the  logistic  model  (13:9).  Faneuff  and 


=00+  0ixi  -f  -f  (  (12) 

=  the  logit  of  pivducttve  capacity 
=  experience  (months  in  AFS) 

=  ASVAB  aptitude  score  (Electronic  Composite) 

=:  parameters  to  be  estimated 
=  the  model  error  terms. 
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others  estimated  PC  at  the  overall  or  aggregate  level  using  the  model  expressed  in  Equation  13 
(13:9-10).  Faneuff  and  others  computed  PC  as  WTPT  score f  maximum  observed  IVTPT  score, 
using  data  collected  under  the  Air  Force’s  JPMS  Project. 


PC  =  +  ^1*1  +  +  03X3  +  f 


(13) 


where 


PC 


*1 


X2 


X3 


0Oy  0\,02,03 
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productive  capacity 

ASVAB  aptitude  score 

experience  (months  in  Air  Force) 

a  binary  variable  representing  skill  level 

(coded  1  if  skill  is  5  or  higher,  0  otherwise) 

parameters  to  be  estimated 

the  model  error  terms. 


The  model  was  estimated  for  six  of  eight  jobs  studied  under  the  JPMS  Project.  One  job, 
Aerospace  Ground  Equipment  (the  job  studied  in  this  thesis),  was  analyzed  using  two  ASN'AB 
aptitude  composites,  Electronic  and  Mechanical,  yielding  a  total  of  seven  possible  prediction  mod¬ 
els.  The  regression  results  showed  a  significant  aptitude  coefficient  in  four  of  seven  total  cases,  a 
significant  experience  coefficient  in  four  cases,  and  a  significant  skill  level  coefficient  in  three  cases 
(all  coefficients  were  tested  at  the  o  =  .05  level).  Model  R's  ranged  from  .10  to  .23  (13:9-10). 

AL/HRM  modeled  estimated  performance  time  data  (as  oppo.sed  to  PC  data)  at  the  ta.sk 
level,  using  the  learning  curve  model  expressed  in  Equation  H  (7)  (38).  The  data  used  was  that 
collected  by  Leighton  and  others  for  the  Aerospace  Ground  Equipment  specialty  (the  same  data 
used  in  this  thesis)  (21). 
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ln{t)  =  /?o  +  /3i/n(ii)  +  P2X2  +  f^3X\X2  +  ( 


(14) 


where 

t  =  esitmaied  task  performance  time 

Xi  =  experience  (months  on  the  job) 

*2  =  ASVAB  aptitude  score  (Mechanical  composite  score) 

00, Pi, 02,^3  =  parameters  to  be  estimated 

t  =  the  model  error  terms. 

A  general  form  of  the  learning  curve  is  expressed  in  Equation  15  (2)  (18).  Like  the  logistic 
model  used  by  Carpenter  and  others,  the  learning  curve  model  in  its  original  form  is  not  a  linear 
model.  But,  like  the  logistic  model,  the  learning  curve  model  can  be  linearized  so  that  its  parameters 
can  be  estimated  via  least  squares.  The  linearized  learning  curve  model  is  expressed  in  Equation  16. 
Note  that  AL/HRM’s  linearized  model  (Equation  14)  is  analogous  to  :he  general  form  of  the 
linearized  learning  curve  model  (Equation  16).  A  typical  learning  curve  is  plotted  in  Figure  7. 

t  =  Ax^'+e  (15) 

where 

t  =  task  performance  time 

r  =  units  of  experience 

A,3i  =  parameters  to  be  estimated 

e  =  the  model  error  terms. 

Equation  15  can  be  written  in  linear  form  as  Equation  16. 
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Performance  Time  (t) 


Experience  (x) 

Figure  7.  Plot  of  a  Typical  Learning  Curve 

ln{t)  =  00 +,  Piln{x)  +  (  (16) 

where 

t  =  task  performance  time 

X  =  units  of  experience 

00  =  ln(A),  a  parameter  to  6e  cstiwnfc(t 

01  =  a  parameter  to  be  estimated 

(  zz  the  model  error  terms. 

Using  the  linearized  learning  curve  model  expressed  in  Ec|uation  M.  AL/HRM  found  signifi¬ 
cant  coefficients  for  ln(job  experience)  for  26  of  the  50  tasks,  significant  aptitude  coefficients  for  18 
tasks,  and  significant  aptitude  x  experience  interaction  coefficients  for  1-1  tJisks  (all  coefficients  were 
tested  at  the  o  —  .05  level)  (38).  Model  /?'s  ranged  from  .01  to  .20.  The  models  showed  significant 


Performance  Time  (t) 


regression  relations  at  the  a  =  ,05  level  for  41  of  the  50  tasks.  The  inclusion  of  the  aptitude  and 
interaction  terms  in  their  model  allowed  AL/HRM  to  create  learning  curves  broken  out  by  aptitude 
group.  An  example  of  learning  curves  broken  out  by  aptitude  groups  can  be  found  in  Figure  8. 

One  significant  problem  with  the  learning  curve  model  is  that  there  is  no  apparent  way  to 
model  job  performance  at  the  overall  job  level.  To  model  overall  job  performa’'ce  t^ing  the  learning 
curve  model  would  likely  require  aggregation  of  task-level  performance  times.  Such  an  aggregated 
measure  would  have  a  dubious  interpretation. 

2.7  Relating  the  Liieratnrt  to  the  Research  Objectives. 

The  previous  sections  of  this  chapter  outlined  modeling  in  general,  and  a  great  deal  of  lit¬ 
erature  on  the  Air  Force’s  job  performance  modeling  RAD.  This  section  serves  to  provide  a  brief 
overview  of  the  literature  with  specific  reference  to  the  research  objectives  outlined  in  Cli:.|iter  1. 
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i.l.l  Formtlating  a  Producltre  Capactly  Measure  from  Estimated  Task  Performance  Times. 

As  previously  mentioned  in  Chapter  1.  an  important  rew-arch  objective  of  this  thesis  was  to 
identify  an  appropriate  way  of  formulating  a  PC  measure  from  the  estimated  task  performance 
times  collected  under  the  Productive  Capacity  Project  (21 ).  This  PC  formulation  was  to  transform 
the  estimated  performance  times  into  a  standardized  measure  that  has  meaning  across  tasks  and 
across  jobs.  Recall  that  such  standardization  was  also  required  for  aggregating  data  across  tasks. 
There  are  probably  numerous  ways  that  the  estimated  task  performance  times  can  be  transformed 
into  a  meaningful  PC  measure.  In  the  literature,  three  ways  to  formulate  a  PC  measure  were 
proposed.  These  are  described  below. 

S.7.1.1  The  Formulation  of  Productive  Capacity  During  the  Initial  Productive  Capacity 
Research.  In  one  of  the  first  PC  re.soarch  efforts,  Carpenter  and  others  did  a  study  of  Avionics 
Communications  Specialists  and  proposed  the  original  PC  formulation.  They  computed  PC  as  t’/i, 
where  (*  is  the  fa.stest  possible  time  in  which  a  given  amount  of  work  can  be  completed,  and  t  is 
the  time  that  it  takes  the  individual  being  as.ses.sed  to  complete  the  work  (5:21).  In  this  original 
work,  the  I*  and  1  measures  were  applied  to  clusters  of  tasks. 

This  formulation  has  the  desirable  quality  of  ranging  from  zero  to  one,  which  remits  in  an 
intuitively  appealing  interpretation.  Tlie  measure  can  be  interpreted  as  an  individual's  output  as 
a  proportion  of  maximum  possible  output. 

Although  the  data  collected  under  the  Productive  Capacity  Project  is  collected  at  the  t.ask 
level  as  opposed  to  the  task  cluster  level,  the  measure  could  just  as  easily  be  formulated  at  the  task 
level,  as  in  the  case  of  the  current  research. 

Faneuff  and  others  used  an  adaption  of  the  PC  formulation  of  Carpenter  and  others  in  an 
effort  to  extend  Carpenter  and  others’  work  to  a  greater  number  of  jobs  (Kl).  The  formulation 
used  by  Faneuff  and  others  w,as  in  fact  I  ft',  an  apparent  inversion  of  the  C/t  forimilation.  But. 
the  I  and  /*  did  not  represent  performance  times  but  WTPT  scores  collected  under  the  Air  Force  s 


JPMS  Project.  The  variable  (  was  an  individual'.s  VV’TPT  score  while  (*  was  the  highest  obtained 
score  for  the  sample.  Since  the  t'  and  t  values  represented  scores  in  which  higher  is  better  {the 
opposite  of  performance  time),  Faneuff  and  others’  formulation  is  essentially  equivalent  in  terms  of 
interpretation,  as  the  Carpenter  and  others  /*/<  formulation. 

2.7. 1.2  The  Formuialton  of  Productive  Capacity  Under  the  Productive  Capacity  Project. 

AL/iIRM  proposed  the  PC  formulation  </<*  (7).  This  is  asimple  inversion  of  the  ratio  proposed  by 

Carpenter  atid  others.  This  reformulation  was  made  because  of  AL/HRM’s  concern  that  the  origi¬ 
nal  formulation  of  Carpenter  and  others  doc*s  not  result  in  a  linear  transformation  of  the  estimated 
performance  time  data  This  was  perceived  as  a  nui.sance  factor  for  the  type  of  analyses  AL/HRM 
was  considering.  A  nonlinear  transformation  has  the  ability  to  adversely  influence  measures  of 
linear  relationship  between  two  variables,  such  as  the  Pearson  correlation  coefficient. 

The  AL/HRM  formulation  does  not  have  the  desirable  property  of  ranging  from  zero  to  one. 
Although  PC  scores  under  this  formulation  can  range  from  one  to  co,  the  scores  still  maintain 
a  degree  of  interpretability.  A  PC  score  of  one  means  an  individual  is  theoretically  operating  at 
maximum  possible  PC.  Scores  above  one  represent  multiples  of  the  fastest  possible  performance 
times.  For  instance,  a  PC  score  of  two  would  imply  that  it  takes  the  individual  receiving  this  score 
takes  twice  as  long  as  the  fastest  possible  performance  time,  to  complete  the  work. 

2. 7. 1. 3  The  Forniulaiion  of  Producliir  Capacity  «ti  Tune  Studies.  Although 

time  studies  were  not  previously  reviewed  during  the  discu.ssion  of  Air  Fore'  '  performance 
measurement  efforts,  their  methodology  provides  a  possible  f'C  formulation  so  they  must  now 
be  reviewed.  A  time  study  is  generally  an  Industrial  Engineering  technique  used  to  derive  time 
standards  for  conqileting  certain  joh  tn.sks  and  production-type  jobs. 

A  first  step  in  a  time  study  is  to  clearly  specify  the  operation  to  be  studied.  After  the 
operation  is  clearly  specified,  a  generally  average  worker,  or  operator,  is  .selected  to  serve  as  the 


subject  of  the  study.  The  operator  is  then  timed  with  a  stopwatch  by  a  qualified  observer,  for  a 
specified  number  of  cycles  of  the  work.  After  the  performance  times  are  collected,  the  observer 
then  assigns  a  performance  rating  reflective  of  the  production  rate  of  the  operator. 

The  performance  rating  is  u.sed  in  “equitably  determining  the  time  required  to  perform  a 
task  by  the  normal  operator  after  the  observed  values  under  study  have  been  recorded”  (29;325). 
In  other  words,  the  performance  rating  is  used  to  adjust  the  time  of  the  actual  operator  so  that 
it  reflects  the  time  to  be  expected  for  a  truly  normal  operator.  If  the  selected  operator  worked 
faster  than  normal,  as  perceived  by  the  observer,  the  observed  time  would  be  adjusted  downward 
to  reflect  the  normal  time.  Likewise,  if  the  operator  performed  slower  than  normal,  the  observed 
time  would  be  adjusted  upward. 

A  common  method  of  performance  rating  assumes  that  the  normal  operator  is  eissociated  with 
a  rating  of  100,  and  performance  greater  than  normal  is  indicated  by  values  directly  proportional  to 
100  (29:345).  Thus,  a  rating  of  120  would  indicate  that  the  operator’s  performance  is  20%  greater 
than  normal,  while  a  rating  of  80  would  indicate  performance  20%  below  normal  (29:345). 

This  time  study  performance  rating  can  be  interpreted  as  a  PC  measure  for  a  given  operator. 
The  underlying  formulation  of  the  measure  could  be  slated  as  (tnormoi/0  ^  ^00,  where  t„armai  is 
the  time  it  would  take  a  normal  operator  to  do  the  task  under  study,  and  (  is  the  time  it  takes  the 
aclua/ operator  to  com|ilete  the  task. 

This  PC  fornmlation  offered  a  third  option  for  standardizing  the  estimated  performance  time 
data  collected  under  the  Productive  Capacity  Project,  provided  the  reasonable  substitution  of 
lave  for  Inarmai  is  made.  The  quantity  the  average  time  to  complete  the  task,  is  virtually 
.synonymous  with  and  could  be  computed  given  the  available  Productive  Capacity  Project 

(lata. 

2.7.S  Sel(ciiiig  a  Task  Weighting  Scheme.  Applying  task  weightings  woidd  give  the 
ta.sks  different  levels  of  influence  on  the  computed  overall  PC  measure.  This  weighting  is  e.sscntial 


59 


if  one  is  to  allow  more  important  tasks  to  have  greater  impact  on  overall  PC.  The  question  is  which 
tasks  are  more  important? 

Tasks  are  known  to  differ  on  many  dimensions  such  as  criticality,  time  to  complete  them, 
learning  difficulty,  percent  of  the  incumbents  performing  them,  and  percent  of  time  incumbents 
spend  on  the  tasks  (40).  Any  of  such  factors  could  ser^e  as  a  weighting  factor,  depending  on  the 
nature  of  the  overall  PC  measure  being  computed. 

In  developing  the  VVTPT,  task  clusters  were  weighted  by  the  product  of  the  mean  recom¬ 
mended  training  emphasis  rating  and  the  cumulative  percent  time  spent  performing  tasks  in  a 
cluster  (23:6).  The  weights  were  used  in  determining  how  many  tasks  from  each  cluster  to  include 
in  the  VVTPT.  This  weighting  factor  assigned  weights  (importance)  to  tasks  based  on  how  impor¬ 
tant  the  tasks  were  perceived  in  the  training  community  and  how  much  time  airmen  spend  doing 
them.  This  appeared  to  be  a  reasonable  weighting  factor  for  selecting  tasks  for  the  WTPT,  but 
did  not  appear  so  for  computing  overall  PC  measures.  Since  PC  is  a  quantity-based  measure  of  a 
worker’s  capacity  to  produce,  it  did  not  seem  appropriate  to  let  the  training  emphasis  play  a  part 
in  the  weighting  scheme  since  this  did  not  seem  to  be  an  influencing  factor  on  how  much  an  airman 
can  produce. 

Carpenter  and  others,  in  the  initial  PC  research,  used  a  weighting  scheme  to  weight  the 
estimated  performance  times  of  individual’s  on  the  10  task  clusters  when  coniputing  overall  PC 
(5).  But,  it  is  not  stated  what  the  weighting  scheme  was. 

S.7.3  Aggregating  the  Task-Level  Data  into  an  Overall  Productive  Capacity  Measure. 

As  just  mentioned,  in  the  initial  PC  research.  Carpenter  and  others  used  a  weighted  average  of 
the  estimated  performance  times  for  tin-  task  clusters  to  compute  an  overall  ob.served  PC  measure 
(5:23).  But  there  was  no  mention  of  what  the  weighting  scheme  was.  Unfortunately,  this  was  the 
only  research  documented  by  the  Air  Force  where  job  performance  data  were  collected  at  the  ta.sk 
or  task  cluster  level  and  so  required  aggregation.  The  literature  thus  indicates  that  the  only  way 
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task-level  data  has  been  aggregated  into  overall  PC  measures  was  through  weighted  averaging  of 
the  task-level  data. 

2.1. ^  Developing  Prediction  Models.  There  were  three  primary  studies  which  involved 
the  modeling  of  PC  or  performance  time  as  the  response,  and  the  use  of  aptitude  and  experience 
as  predictors.  These  were  the  studies  by  Carpenter  and  others,  FaneufT  and  others  and  AL/HRM 
(5)  (13)  (38).  The  results  of  the  studies  were  varied.  Carpenter  and  others  reported  the  highest 
R^s  of  any  of  the  studies  using  a  two-predictor,  first-order  logistic  model  (5).  FaneufT  and  others 
found  that  a  first-order  linear  model  fit  their  PC  data  better  than  a  logistic  model  ( 13).  Finally, 
AL/HRM  found  relatively  good  fit  to  untransformed  time  data  using  learning  curve  models  (38). 

2.S  Research  Direction. 

The  reviewed  literature  provided  some  definite  direction  for  the  current  research.  First,  The 
literature  suggested  four  possibilities  for  meeting  the  first  research  objective,  formulating  a  PC 
measure  from  task-level  time  data: 

1.  r/f 

2.  t/t- 

3. (*-^)xl00  - 

A.  t 

Previous  research  offered  only  limited  insight  into  how  to  meet  the  second  research  objective, 
selecting  a  task  weighting  scheme.  In  developing  the  WTPT,  tasks  were  weighted  by  the  mean 
recommended  training  emphasis  rating  and  the  cumulative  percent  lime  spent  performing  tasks 
in  a  cluster  (23:C).  However,  such  a  weighting  scheme  did  not  appear  appropriate  for  the  current 
research  because  of  the  nature  of  the  PC  measure.  (PC  is  a  quantity-based  measure  of  a  worker’s 
capacity  to  produce,  and  to  weight  it  by  mean  recommended  training  emphasis  rating  and  the 
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cumulative  percent  time  spent  did  not  appear  to  make  sense.)  This  was  the  only  weighting  scheme 
discussed  in  the  literature.  Since  it  did  not  seem  appropriate  for  the  current  research,  the  literature 
thus  provided  no  particular  direction  for  the  second  research  objective. 

The  literature  likewise  offered  only  limited  direction  for  the  third  objective,  aggregating  the 
task-level  PC  data.  The  only  aggregation  method  discussed  in  the  literature  was  weighted  averaging 
of  the  tzisk  cluster-level  data  (5:6).  However,  this  seemed  to  be  a  reasonable  aggregation  method 
and  WM  chosen  as  the  method  of  aggregation  for  this  thesis. 

The  literature  did  provide  significant  guidance  for  the  last  and  most  important  objective, 

developing  prediction  models.  Three  models  having  relevance  to  the  current  research  (response  of 

j 

PC  or  performance  time)  were  discussed  in  the  literature.  These  models  wer^: 

1.  Logistic  model  for  predicting  PC 

2.  Linear  model  for  predicting  PC 

3.  Learning  curve  model  for  predicting  performance  time 

I 

I 

Since  there  was  little  or  no  guidance  provided  for  the  second  and  third  research  objectives, 

j 

the  research  direction  suggested  by  the  literature  can  best  be  summarized  in  Figure  9.  In  Figure  9, 
the  individual  boxes  indicate  the  resp  'se  formulation  and  model  tyjie  combinations  which  existed, 

I 

given  previous  studies.  The  darkened  bo:';es  indicate  which  combinations  were  inappropriate  due 
to  response  formulation  and  model  type  incompatibility.  Written  in  the  appropriate  boxes  arc  the 
studies  that  were  done  for  a  given  response  formulation  and  model  type  combination.  An  empty 
box  indicates  no  studies  have  been  done  for  a  particular  combination. 

It  was  decided  that  this  thesis  would  incorporate  one  of  the  PC  formulation  and  model 
type  combinations  for  which  a  previous  study  had  been  done.  This  was  to  take  advantage  of  the 
information  available  as  a  result  of  the  previous  study.  This  left  three  choices: 

1.  Logistic  Model  with  the  f'/f  formulation 


(52 


MODEL TYPE 


RESPONSE 

FORMULATION 


Logistic  Lineir  Learning  Curve 


Figure  9.  Graphical  Representation  of  the  Research  Direction  Suggested  by  the  Literature 


2.  Linear  Model  with  the  t’ ft  formulation 

3.  Learning  curve  model  with  t  formulation 

Because  Carpenter  and  others,  using  the  logistic  model  with  the  /’/t  formulation,  obtained 
higher  R~s  than  Faneuff  and  others  did  with  the  linear  model,  the  first  combination  above  was 
determined  a  better  alternative  than  the  second.  And,  because  the  learning  curve  model  seemed 
inappropriate  for  modeling  overall  job  performance,  the  first  combination  also  appeared  better  than 
the  third.  It  was  thus  decided  that  given  the  estimated  time  data  collected  under  the  Productive 
Capacity  Project,  the  response,  PC,  would  be  formulated  as  t*/f,  and  the  regression  model  for 
predicting  it  would  take  the  form  of  the  logistic  model.  The  remainder  of  this  thesis  documents 
the  research  performed  to  develop  the  regression- based  job  performance  model,  using  the  PC  for¬ 
mulation  t*/t  and  the  logistic  regression  model. 
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///.  ^fethodology 


The  last  two  chapters  were  designed  to  provide  the  reader  with  a  substantial  background  on 
modeling,  and  the  Air  Force’s  job  performance  measurement  RA;D  leading  up  to  and  including  the 
first  research  effort  under  the  Productive  Capacity  Project  (21).  This  chapter  describes  the  steps 
taken  in  developing  the  experiment  .1  mathenc-atical  models  for  predicting  the  job  performance  of 
Air  Force  Aerospace  Ground  Equipment  (AGE)  personnel  given  the  estimated  task  performance 
time,  aptitude  and  experience  data  collected  under  the  Productive  Capacity  Project.  Development 
of  such  descriptive  models  was  of  course  the  primary  research  objective  of  this  thesis.  This  chapter 
begins  with  a  brief  overview  of  the  subjects  and  data  used  to  meet  the  research  objectives.  Following 
the  overview  of  subjects  and  data,  the  specific  steps  taken  to  meet  each  research  objective  are 
discussed. 

3.1  Subjects. 

The  experimental  subjects  were  204  airmen  and  NCOs  studied  by  Leighton  and  others  under 
the  Air  Force’s  Productive  Capacity  Project  (21).  The  subjects  were  assigned  to  Air  Force  spe.;ialty 
454X1,  AGE.  AGE  personnel  are  the  airmen  responsible  for  inspecting,  maintaining  and  repairing 
necessary  ground  equipment  used  to  support  aircraft  and  Ground  Launcheo  bruise  Missile  (GLM) 
systems  (8).  Such  ground  equipment  is  called  aerospace  ground  equipment  and  includes  items  such 
as  electrical  generators,  heaters,  hydraulic  boniblifts,  and  air  compressors. 

The  subjects  were  from  the  Air  Force  bases  listed  in  Table  5.  The  procedures  u.sed  to  select 
the  experimental  subjects  are  described  briefly  in  section  2. 3. 2. 5  and  in  depth  in  the  technical 
paper  by  Leighton  and  others  (21).  Figure  10  through  Figure  13  describe  some  notable  sample 
characteristics. 

As  can  be  seen  in  Figure  10,  the  vast  majority  of  the  sample  were  E-3  (Senior  .Airmen)  or 
E-4s  (.Sergeants).  Also,  Figure  11  shows  that  most  of  the  sample  was  from  the  5  skill  level,  with 
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Figure  10.  Frequency  Distribution  and  Pie  Chart  of  Subject  Grade 


Figure  11.  Frequency  Distribution  and  Pie  Chart  of  Subject  Skill  Level 
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Figure  13.  Frequency  Distribution  and  Pie  Chart  of  Subject  Aptitude 


much  smaller  numbers  being  from  the  3  and  7  levels.  In  reference  to  the  subjects’  job  experience, 
Figure  12  shows  the  majority  of  the  sample  were  likely  first-term  airmen  (121  our  of  204,  or  59.31%) 
as  indicated  by  job  experience  between  zero  and  48  months  (Some  retrainees  might  also  show  a 
job  experience  less  than  or  equal  to  48  months  but  have  significantly  more  Air  Force  experience.). 
Note  also  that  88.73%  of  the  sample  had  eight  years  or  less  experience,  and  only  about  2.45%  had 
more  than  eight  years  experience.  Also,  observe  that  the  experience  data  appear  positively  skewed, 
meaning  the  preponderance  of  subjects  were  associated  with  job  experience  measures  near  the  low 
end  of  the  experience  range. 


The  aptitude  distribution  in  Figure  13  indicates  that  the  sample  Mechanical  aptitude  per¬ 
centile  scores  were  distriLuted  between  46  and  99.  The  ASVAB  Mechanical  score  (M)  distribution 

i 

was  considered  importarit  because  the  M  score  was  used  as  the  aptitude  predictor  in  the  regression 
modeling.  The  M  score  was  selected  as  the  aptitude  predictor  because  there  was  a  minimum  M 
score  requirement  for  etitering  the  AGE  specialty.  This  indicated  that  mechanical  aptitude  had 

been  previously  identifieid  by  the  Air  Force  as  being  somehow  related  to  performance  in  the  AGE 

I 


specialty,  thus  M  seemed  appropriate  as  an  aptitude  predictor  variable  for  the  current  study.  The 

i 

Mechanical  aptitude  scole  distribution  was  restricted  to  values  generally  greater  than  51  because 
this  is  the  current  minimum  Mechanical  aptitude  score  requirement  for  entering  the  job.  (An  ad¬ 
ditional  requirement  of  an  Electronic  percentile  score  (E)  of  at  least  33  also  exists  for  entering  the 


job  (8).) 


Table  9  provides  a  two-freouency  distribution  of  subject  aptitude  by  job  experience.  Recall 
that  aptitude  and  experience  are  the  predictor  variables.  The  two-way  frequency  distribution  was 
provided  to  offer  insight  as  to  what  the  true  effective  range  of  the  estimated  regression  model  is. 
In  other  words,  sparse  or  null  cells  in  regions  of  Table  9  indicate  that  the  regression  model  should 
be  interpreted  cautiously  in  such  regions.  This  is  because  the  .shape  of  the  estimated  response 


surface  in  such  areas  was  d  terinined  by  relatively  few  lata  points.  Note  that  the  matrix  depicted 
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Table  9.  Two-Way  F'equeHcy  Distribution  of  Sample  Aptitude  by  Job  Experience 


Months  of  Job 
Experience 

ASVAB  Mechanical  Percentile  Score 

46-55 

56-65 

66-75 

76-85 

86-95 

96-99 

Unknown 

Total 

0-12 

0 

5 

1 

4 

3 

0 

0 

13 

13-24 

4 

5 

12 

5 

2 

2 

1 

31 

25-36 

1 

10 

12 

9 

8 

4 

0 

44 

37-48 

2 

8 

10 

6 

5 

1 

1 

33 

49-60 

1 

6 

8 

7 

3 

2 

3 

30 

61-’2 

0 

0 

3 

1 

2 

1 

5 

12 

73-84 

2 

2 

0 

1 

5 

0 

3 

13 

85-96 

0 

1 

2 

1 

0 

0 

1 

5 

97-108 

1 

0 

0 

0 

0 

0 

0 

1 

109-120 

0 

0 

0 

0 

1 

0 

4 

5 

121-132 

0 

0 

1 

0 

0 

0 

1 

2 

133-144 

0 

0 

C 

0 

0 

0 

0 

0 

145-156 

0 

0 

0 

0 

0 

0 

0 

0 

157-168 

0 

0 

0 

0 

1 

0 

2 

3 

169-180 

0 

0 

0 

0 

0 

0 

1 

1 

>  180 

0 

1 

0 

0 

0 

0 

0 

1 

Unknown 

1 

2 

5 

0 

1 

0 

1 

10 

Total 

12 

40 

J.1 

34  31  10  23 

1  204 

in  Table  9  is  very  sparse  beyond  96  months  of  job  experience.  The  estimated  models  may  thus  be 
tenuous  in  that  region. 


In  summary,  the  sample  tended  to  be  E-3s  and  E-4s,  with  skill  levels  around  5.  Further,  the 
airmen  tended  to  have  le.ss  than  eight  years  of  job  experience,  and  aptitude  covering  the  somewhat 
restricted  range  of  46  to  99. 


S.S  Data. 


As  previously  mentioned,  the  data  used  in  this  thesis  were  collected  under  the  Air  Force's 
Productive  Capacity  Project,  by  Leighton  and  others,  between  March  and  September  1990  (21). 
A  brief  overview  of  the  Leighton  and  others’  research,  to  include  data  collection,  was  included  in 
Section  2. 3, 2. 5.  Again,  the  reader  is  referred  to  (21)  for  a  complete  description  of  that  research. 
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The  primary  data  used  were  the  estimated  task  performance  times  provided  by  each  subject’s 
supervisor.  Associated  with  each  subject  was  his  or  her  supervisor’s  estimates  of  how  fast  he  or 
she  could  complete  each  of  50  job  tasks  while  simultaneously  working  as  quickly  as  possible  and 
maintaining  an  acceptable  level  of  task  quality.  Complete  task  descriptions  are  included  in  Table  22 
at  Appendix  A. 

The  tasks  selected  for  analysis  were  those  that  tend  to  be  performed  by  fairly  junior  and 
intermediate  personnel.  The  tasks  included  mostly  hands-on  production-type  tasks  eis  opposed  to 
the  supervisory  or  management  tasks  that  more  senior  personnel  perform.  With  this  in  mind,  the 
sample  described  in  the  previous  section  appeared  to  be  a  fairly  reasonable  sample  zis  indicated  by 
the  grade,  skill  level  and  experience  characteristics  provided. 

Not  all  subjects  had  a  complete  set  of  50  task  ratings.  Some  supervisors  did  not  provide  all 
ratings  for  all  subjects.  As  a  result,  a  relatively  small  number  of  missing  values  existed. 

As  previously  indicated,  other  primary  data  used  for  the  analyses  included  the  subjects’  self- 
reported  level  of  job  experience,  and  the  subjects’  Mechanical  omposite  score  from  the  ASVAB 
obtained  when  applying  for  enlistment.  These  data  were  used  as  predictors  in  the  mathematical 
prediction  of  the  subjects’  productive  capacity.  As  previously  mentionec'  tl.  ’  if  aptitude  score 
was  chosen  as  the  aptitude  predictor  because  scores  on  this  composite  help  determine  a  recruit’s 
eligibility  for  entering  the  AGE  specialty. 

Secondary  data  of  interest  were  the  subjects’  Job  Knowledge  Test  percent  correct  scores 
(JKT),  the  supervisors’  global  or  overall  estimates  of  the  subjects’  PC  (GPC),  and  a  PC  measure 
derived  from  actual  stopwatch  times  of  a  limited  subsample  of  the  subjects  (MTPC).  These  mea¬ 
sures  were  used  eis  a  basis  for  comparison  for  the  regression  model  results  derived  in  this  thesis. 
Figure  14  provides  r  graphical  representation  of  the  data  used  in  the  analyses. 
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Figure  14.  Graphical  Representation  of  the  Data  Used  in  the  Analyses 


3.3  Procedure. 

The  preceding  sections  provided  a  brief  overview  of  the  experimental  subject  sample  and  the 
relevant  data  collected.  Discussion  may  now  proceed  to  the  actual  steps  taken  to  meet  the  research 
objectives.  The  reader  may  wish  to  keep  in  mind  that  although  the  primary  research  objective 
was  to  develop  regression-based  job  performance  models,  the  first  three  research  objectives  (see 
Section  1.3)  were  concerned  only  with  the  response  information,  the  estimated  task  performance 
times. 


3.3.  J  Forviulating  a  Proiucitve  Capacity  Measure  Jro’  .  iled  Task  Performance  Times. 

As  mentioned  in  Chapter  1,  it  was  necessary  to  transform  the  estimated  task  performance  times 
to  give  then  interpretability  and  to  allow  them  to  be  aggregated  across  tasks. 

In  reference  to  Figure  4,  the  formulation  of  a  PC  measure  from  the  raw  time  data  is  rissociated 
with  first  phase  of  the  model  building  process,  data  collection  and  preparation.  Of  course,  the  PC 
formulation  was  only  concerned  with  the  preparation  part  because  the  data  had  already  been 
collected.  In  reference  to  Figure  14,  the  PC  formulation  involved  editing  and  transforming  the  data 
under  the  Est.  Time  on  Task  i  columns. 
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9.5.1.1  Defining  Task-Level  Productive  Capaciig.  Task-level  PC  was  defined  ac¬ 
cording  to  the  Carpenter  and  others  formulation  t'  jt  (5:21).  Recall  that  f*  is  the  fastest  pcjssible 
completion  time  for  a  given  task,  and  i  is  a  subject’s  completion  time.  In  reference  to  Figure  14. 
the  ts  are  the  entries  under  the  Est.  Time  on  Task  t  columns.  The  t's  were  derived  from  the 
minimum  observed  time  in  each  such  column. 

As  explained  at  the  end  of  the  previous  chapter,  the  f*/f  formulation  was  selected  over  the 
other  possible  formulations.  The  C/t  formulation  has  some  desirable  characteristics  which  made 
it  a  reasonable  choice.  First,  unlike  the  other  formulations,  that  of  Carpenter  and  otiiers  yields 
values  that  range  between  zero  and  one,  thus  lending  themselves  to  logistic  regression  models 
Recall  that  it  was  the  logistic  regression  models  of  Carpenter  and  others  that  yielded  the  highest 
reported  model  R^t  for  anj  of  the  f’C  studies  (5)  (13)  (38).  Second,  the  Carpenter  and  others 
formulation  maintains  the  de.iirable  property  of  being  nicely  interpretable.  It  can  be  interpreted  as 
an  individuals  work  capacity  as  a  proportion  of  maximum  possible  capacity. 

3.3. 1.2  Editing  the  Raw  Estimated  Time  Data.  Before  the  PC  measures  were 
computed  from  the  estimated  task  performance  times,  the  estimated  times  were  edited  to  control 
for  serious  outliers.  As  Neter,  Wasserinan  and  Kutner  (1990)  point  out,  “Outliers  can  cause  great 
difficulty.”  (27:121)  They  describe  how  when  least-squares  estimation  is  used  in  trying  to  predict  a 
response,  a  fitted  surface  can  be  pulled  disproportionately  towarc^  an  outlier.  They  suggest  discard¬ 
ing  an  outlier  “if  there  is  direct  evidence  that  it  represents  an  cr|or  in  recording,  a  miscalculation, 
a  malfunctioning  of  equipment,  or  a  similar  type  circumstance.”  |[27:122)  The  rea.son  that  editing 
was  justified  with  the  raw  estimated  lime  data  is  because  the  forljnat  in  which  the  lime  estimates 
were  collected  was  a  type  of  fire -re spouse  format.  This  means  thai  there  was  no  limitation  on  the 
answers  that  could  be  given.  Recall  that  when  the  supervisors  provided  their  lime  e.sliniales,  they 
were  provided  with  previously  created  benchmark  scales  showing  SMEs’  opinions  as  to  what  the 
fastest,  normal  anil  slowest  completion  times  were.  However,  these  were  to  be  iiNed  ii-s  tiiL<-it-oi- 
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Figure  15.  Histograms  of  the  Raw  and  Edited  Estimated  Times  for  Task  G17f) 


Uave-it  guidance  for  the  supervisors  in  making  their  estimates  Tlic  supervisors  were  not  required  to 
keep  their  estimates  between  the  fast  and  slow  time  benchmark  times  if  they  did  not  want  to.  This 
resulted  in  a  free-response  format.  One  problem  with  the  free  response  format  is  that  responses 
have  a  tendency  to  wid'  !y  vary,  even  to  extremes.  This  necessitated  the  need  for  data  editing. 

To  control  for  pet  ntial  outliers,  the  raw  time  estimates  for  a  given  task  were  edited  by  pnlltng 
in  all  values  that  were  beyond  three  standard  devi.ations  from  r^ean.  In  other  words,  all  values 
beyond  three  standard  d'  viations  were  recoded  to  a  value  of  the  mean  ±  three  standard  dcnations. 
This  WM  done  because  for  a  distribution  of  measurements  that  is  approximately  bell-shaped,  the 
interval  between  ±  three  standard  deviations  will  contain  almost  all  the  mcaisuremenls  (25:9). 
Thus,  anything  beyond  these  limit.s  was  considered  an  outlier,  and  recoded.  Values  beyond  three 
standard  deviatioi  s  from  the  me.an  were  recoded  and  not  discarded  because  these  outlying  data 
were  not  considered  to  be  transcription  errors  or  results  of  some  other  error,  but  an  estimate  from 
a  supervisor  who  did  not  happen  to  agrix-  with  the  range  of  times  provided.  The  recoding  was  thus 
done  to  retain  the  information  contained  in  the  outlying  points  while  keeping  some  consistenev  in 
the  ratings  and  keeping  the  variance  at  a  reasonable  level.  Figure  15  provides  an  example  of  tlu' 
effects  of  the  editing  on  the  raw  data  for  one  task,  Gl79 
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S.3.1.S  Computing  the  Productive  Capacity  Measure  at  the  Task  Level.  As  previ¬ 
ously  mentioned,  task-level  PC  was  defined  as  t*/l.  In  the  actual  computation,  t’  for  a  task  was 
computed  as  .99x  {minimum  estimated  time  for  the  task  (after  editing)).  Again,  in  Figure  14,  the 
ts  are  the  entries  under  the  Est.  Time  on  Task  i  columns  and  I's  are  derived  from  the  minimum 
estimated  time  in  each  such  column.  This  computation  of  <*  accounted  for  the  fact  that  true 
fastest  possible  time  was  probably  not  recorded  for  this  sample,  but  is  likely  somewhat  less  than 
the  sample  minimum.  After  computing  <*  in  this  fashion,  the  PC  measure  C/t  was  computed  for 
each  individual  for  each  task. 

S.S.I.4  Editing  the  Productive  Capacity  Measure  at  the  Task  Level.  After  comput¬ 
ing  the  task-level  PCs,  a  review  of  their  histograms  indicated  that  the  editing  of  the  raw  estimated 
times  was  not  enough  to  control  for  serious  outliers.  Several  of  the  task  PC  distributions  still 
indicated  additional  obvious  outliers.  This  indicated  the  need  for  further  editing. 

The  task-level  PCs  were  edited  much  the  same  as  the  raw  estimated  times.  For  each  task, 
PC  measures  beyond  ±  three  standard  deviations  from  the  mean  were  pulled  in  to  values  of  the 
mean  ±  three  standard  deviations.  Unlike  the  diting  of  the  raw  estimated  times,  this  editing 
influenced  the  interpretability  of  the  PC  measure.  .  call  that  PC  is  inter  preled  as  an  individual’s 
output  as  a  proportion  of  maximum  possible  output.  As  an  example  of  how  the  interpretability 
was  influenced,  consider  an  example  where  the  mean  i:  three  standard  deviations  defines  the  range 
of  .2-. 8.  Assume  that  all  values  outside  of  this  range  are  considered  extreme  outliers  and  recoded 
as  .2  or  .8,  depending  on  which  side  of  the  interval  they  fall.  The  recoding  is  done  because  values 
outside  of  the  range  mean  ±  three  standard  deviations  are  considered  impossible.  After  recoding, 
the  range  of  PC  values  is  not  zero  to  one  but  .2  to  .8.  Since  .2  :»presents  th<'  new  lowest  possible 
output  level,  it  must  correspond  to  a  PC  of  zero.  Likewise,  sin  e  .8  represents  the  new  highest 
possible  output  level,  it  must  correspond  to  a  PC  of  one.  To  make  .2  and  .8  correspond  to  zero  and 
one  respectively,  the  rescaling  transformation  in  Equation  IT  wf«  made  on  the  edited  PC  values  for 
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the  task.  The  rescaling  ensured  the  int.erpretahilily  of  the  PC  measure  as  an  individual’s  output 
as  a  proportion  of  maximum  possible  output. 

The  rescaling  transformation  function  in  Equation  17  is  a  linear  function  of  the  original  PC 
data.  This  means  that  the  transformed  data  will  exhibit  exactly  the  same  linear  associations  (same 
correlation  coefficient,  same  linear  regression  results,  etc.)  with  other  variables  as  the  untransformed 
data.  The  rescaling  may,  however,  influence  logistic  regression  results  because  the  logistic  model  is 
not  a  linear  model  in  its  original  form. 

A  small  adjustment  made  to  the  rescaled  values  was  to  recode  the  rescaled  value  of  zero  to 
.01,  and  rescaled  value  of  one  to  .99.  This  was  to  ensure  that  the  logistic  model  would  be  defined 
for  all  computed  rescaled  values.  (The  range  of  the  logistic  function  does  not  include  zero  or  one.) 

^Cmtn 

rCrt,4aUd  =  - 777^ -  (li) 

where 

P(^rt$eated 
PCoit 
P^min 

PC„„r 

After  reviewing  the  histograms  of  the  edited,  rescaled  PC  values,  17  tasks  still  showed  serious 
outliers.  These  were  G171,  C.179,  CilSl,  11238,  12.91, 125.9,  1264,  I2t55,  1283,  1281,  1299,  J332,  J,317, 
L406,  M444,  N48(),  P549.  One  final  editing  iiid  rescaling  was  applied.  This  time,  outliers  from 
the  17  tasks  were  identified  through  .subjective  judgement  by  the  author.  The  outliers  were  then 
pulled  in  to  the  closest  reasonable  observed  value.  The  reediti'd  PCs  were  then  rescaled  according 
to  Equation  17,  and  the  adjust nietits  to  the  zero  and  one  values  were  made.  This  completed  the 
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Figure  16.  Histograms  of  the  Productive  Capacity  Values  in  the  Editing  Process  for  Task  G179 


data  editing  for  the  PC  task  variables.  Figure  16  provides  an  example  to  show  the  effects  of  the 
PC  editing  and  rescaling  for  task  G179. 

After  the  final  editing,  summary  statistics  were  computed  for  the  task-level  PCs  for  compar¬ 
ison  to  the  summary  statistics  for  the  associated  raw  estimated  times.  This  comparison  was  made 
primarily  to  determine  if  the  editing  had  the  desired  effects  of  outlier  .iiid  variance  control.  Of 
primary  interest  was  the  cotfficunt  of  xanaiwn.  The  coefficient  of  variation,  Cl',  is  a  measure  of 
the  dispersion  of  the  di.strihution  of  a  variable.  The  computational  formula  for  CV'  is  show'n  in 
Equation  18  (2(5:388). 


(18) 


CV  = 


•S 

X 


where 

CV  =  coefficient  of  variation 

s  =  standard  deviation 

X  =  arithmetic  mean. 

CV  expresses  a  distribution’s  dispersion  relative  to  the  distribution  mean,  thus  making  the 
measure  comparable  across  variables  with  different  distributions.  A  CV  of  less  than  one  is  generally 
indicative  that  a  distribution  is  not  highly  variable,  thus  partially  indicating  that  the  distribution 
is  not  subject  to  .severe  outliers. 

Having  computed  CV  for  both  the  raw  estimated  times  and  final  PC  measures  for  each  teisk, 
it  was  possible  to  aissess  the  effects  of  the  C/t  transformation,  the  editing  of  the  raw  times,  and 
the  editing  of  the  PCs  on  the  response  data. 

S.3  5  Selecting  a  Task  Weighting  Scheme.  The  selection  and  application  of  a  task 

weighting  scheme  still  involved  the  data  preparation  phase  of  the  regression  model  building  process 
depicted  in  Figure  4.  In  reference  to  Figure  14,  the  selection  of  a  weighting  scheme  involved  the 
identification  of  appropriate  weights  for  each  Est.  Time  on  Task  i  column  to  give  the  data  derived 
from  each  column  an  assigned  level  of  importance.  This  w;is  to  give  the  task-level  data  varying 
levels  of  influence  when  computing  an  overall  measure. 

Because  the  PC  measure  is  time-based  and  reflective  of  overall  worker  output,  it  seemed  most 
appropriate  to  wright  'he  tasks  by  the  average  amount  of  lime  individuals  spend  doing  each  task. 
If  the  individual  unde:  study  is  slow  on  some  tasks  and  fast  on  others,  it  is  necessary  to  consider 
the  relative  amount  ol  time  spent  on  each  task  to  accurately  assess  overall  capacity.  To  illustrate, 
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consider  the  extreme  situation  in  which  a  worker  is  exceptionally  fast  on  all  but  one  job  task. 
And,  say  the  worker  is  exceptionally  slow  on  that  one  outstanding  task.  If  the  job  requires  the 
individual  to  perform  the  outstanding  task  99%  of  the  time,  his  or  her  productive  capacity  should 
be  comparatively  low.  This  is  despite  the  fact  that  his  or  her  performance  is  exceptionally  good  on 
the  other  numerous  tasks  that  are  infrequently  performed. 

The  Occupational  Measurement  Squadron  collects  relative  performance  time  data  as  part  of 
their  periodic  surveys  of  the  AFSs  (40).  One  such  measure  outlined  in  the  Occupational  Survey 
Report  is  Average  Percent  Time  Spent  Performing  Duties  (40:23).  In  the  report,  the  data  is  broken 
out  by  skill  level.  The  task  weightings  used  in  this  thesis  were  computed  as  an  average  of  the 
average  percent  time  spent  for  the  skill  levels  that  would  generally  be  expected  to  do  the  types  of 
hands-on  tasks  under  study  (skill  levels  3,  5  and  7).  Because  of  the  nature  of  the  available  average 
percent  time  spent  data,  weights  had  to  be  derived  for  each  duty  area,  and  the  duty  area  weight 
was  applied  to  each  task  from  that  duty  area. 

Overall,  the  selected  weighting  scheme  was  designed  to  give  greatest  importance  to  tasks  from 
the  duty  areas  that  are  performed  most  often  by  3,  5  and  7  skill  level  airmen. 

3.3.3  Aggregating  the  Task- Level  Data  into  an  Overall  Productive  Capacity  Measure.  As 
with  the  first  two  research  objectives,  this  one  dealt  with  the  data  preparation  phase  of  the  model 
building  process  depicted  in  Figure  4. 

Having  comi>uted  the  la.sk  weights,  it  was  possible  to  define  and  compute  aggregate  or  overall 
PC  per  individual.  The  following  discussion  describes  how  this  was  done. 

■3.3. 3. 1  Defining  and  Computing  Overall  Productive  Capacity.  To  derive  a  single 
PC  measure  for  an  individual  from  his  or  her  task-level  data,  it  was  necessary  to  somehow  collap.se 
task-level  ratings  into  a  single  overall  measure.  Figure  17  presents  a  graphical  illustraiion  of  the 
lask-level  data  aggregation. 
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Figure  17.  Graphical  Representation  of  the  Tcisk-Level  Data  Aggregation 


This  aggregation  of  the  data  was  accomplished  through  weighted  averaging.  Aggregate  or 
overall  PC,  then,  was  defined  as  a  weighted  average  of  the  subjects’  final  edited  and  rescaled  task- 
level  PCs.  Weighted  averaging  was  used  because  previous  studies  had  successfully  used  weighted 
averaging  as  an  aggregation  method  (5).  Also,  weighted  averaging  is  a  commonly  accepted  and 
frequently  applied  statistical  technique  used  to  aggregate  data  (of  the  same  units)  that  differ  on 
known  diiiiensions.  Equation  19  shows  a  mathematical  representation  of  how  the  aggregate  PC 
measures  Were  defined  (21). 


PC 


where 


(19) 
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PCwavg  =  a  weighted  average  of  task-level  PCs 

PCi  =  the  tndividaal’s  PC  on  task  < 

Wi  =  the  weight  for  task  i 

n  =  the  number  of  task  measurements  for  the  individual. 

A  simple,  or  unweighted,  average  was  also  computed  for  strictly  comparative  purposes.  The 
unweighted  and  weighted  average  PC  values  were  compared  through  summary  statistics  and  cor¬ 
relational  analyses.  The  correlation  statistic  used  was  the  Pearson  product-moment  correlation 
coefficient,  r  (26:429).  The  computation  for  r  is  shown  in  Equation  20. 


where 


/(« -  cZi.i  -  (Er=, 


r  =  Pearson  product-moment  correlation  coefficient 

I  =  observation  number 

n  =  number  of  observations  of  U  and  V’ 

Ui  =  observation  i  of  a  variable  U 

Vi  =  observation  i  of  a  variable  V. 


(20) 


The  Pearson  correlation  coefficient  is  a  measjire  of  linear  association  between  two  variables. 
The  coefficient  ranges  between  -1.0  and  1.0.  Measures  near  -1.0  and  1.0  indicate  a  high  degree  of 
linear  relationship.  A  negativ  oefficient  means  the  nie:\.sures  are  inversely  related,  or  one  measure 
tends  to  be  high  when  the  other  is  low. 

The  unweighted  and  weighted  average  PCs  were  compared  via  summary  statistics  and  r  to 
determine  if  the  measures  were  unique.  The  idea  wasthal  if  the  weighted  and  unweighted  measures 
were  statistically  similar  and  highly  positively  correlated,  then  the  weighting  added  no  uniqueness 
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Figure  18.  Graphical  Representation  of  the  Regression  Models  Developed 

to  the  overall  measure.  Similar  summary  statistics  and  high  correlation  would  thus  indicate  that 
the  weighting  scheme  added  nothing  to  the  computation  of  overall  PC  beyond  what  could  be  gained 
by  simple  averaging. 

S.S.4  Developing  Prediction  Models.  After  computing  the  aggregate  PC  variables,  it  was 
possible  to  begin  the  modeling  phase.  The  following  sections  describe  the  steps  taken  to  complete 
the  regression  modeling  at  the  aggregate  level  and  also  at  the  task-level.  Figure  18  provides  a 
graphical  representation  of  the  regression  models  to  he  developed.  The  goal  of  the  regression 
analysis  was  to  determine  the  0  parameter  estimates  that  would  define  the  mathematical  function 
of  the  predictors  depicted  in  the  model  box. 

3.3.4.  t  Editing  the  Predictor  Variables.  The  previously  discus.sed  research  objec¬ 
tives  each  dealt  with  preparation  of  the  response  data  for  the  regression  modeling  of  job  perfor¬ 
mance.  Like  the  response  data,  the  predictor  data  had  to  be  prepared  in  accordance  with  the  first 
phase  of  the  model  building  process  depicted  in  Figure  4.  In  reference  to  Figure  14,  the  graphical 
data  file  depiction,  the  following  editing  procedures  were  applied  to  the  columns  under  the  heading 
Predictor  Variables. 

■As  with  outlying  response  valir-s,  outlying  predictor  values  can  be  problnuatic.  “Outlying 
cases  may  involve  large  residuals  and  often  have  dramatic  elTi'cts  on  the  fitted  hast -squares  re- 
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gression  function.”  (27:392)  Recall  that  the  predictor  variables  are  aptitude  (ASVAB  Mechanical 
percentile  scores)  and  experience  (months  of  job  experience).  Frequency  distributions  and  pie  charts 
for  these  variables  were  provided  in  Table  9,  Figure  12  and  Figure  13. 

The  frequency  distribution  of  aptitude  .scores  indicated  tha  ,  there  were  no  obvious  outliers  or 
other  apparent  problems  with  the  aptitude  data.  The  scores  appeared  near  normally  distributed 
between  46  and  99.  The  experience  variable’s  distribution  appeared  positively  skewed  with  the 
vast  majority  of  the  observations  (88.73%)  having  96  or  less  months  of  job  experience.  Note  that 
the  frequency  distribution  shows  one  potential  outlier  with  a  value  greater  than  180  months.  The 
actual  value  recorded  for  this  observation  was  283  months,  well  beyond  the  next  highest  value  of 
169.  A  review  of  the  data  file  showed  that  no  subject  had  more  than  195  months  of  total  Air  Force 
experience.  It  is  of  course  impossible  to  have  more  Air  Force  job  experience  than  overall  Air  Force 
experience  thus  the  value  of  283  was  identified  as  a  miscoding.  The  case  was  dropped  from  further 
analyses. 

3.3. 4-2  Fitting  the  Regression  Models.  The  editing  of  the  predictor  variables 
concluded  the  data  preparation  phase  of  the  model  building  process.  The  next  phases,  according  to 
Figure  4,  were  reduction  of  the  number  of  predictor  variables  and  model  refinement  and  selection. 
The  following  discussion  describes  these  phases  applied  to  the  current  research. 

Recall  from  the  literature  review  that  the  model  which  yielded  the  highest  R~s  among  the 
Air  Force’s  PC  studies  was  the  logistic  model  used  by  Carpenter  and  others  (5:21)  (13)  (38).  With 
this  result  in  mind,  a  logistic  model  was  fit  to  the  PC  data  for  each  of  the  50  tasks,  and  also  to  the 
weighted  and  unweighted  average  PCs.  The  logistic  model  and  logistic  regression  were  discussed 
only  briefly  in  the  previous  chapter.  Following  is  a  more  in-depth  discussion. 

The  logistic  regression  model  is  a  model  that  is  frequently  applied  in  situations  where  the 
response  variable  is  binary,  zero  or  one.  In  such  situations,  the  observations  are  often  classified  into 
groups  based  on  values  of  one  or  more  predictor  variables.  Thus,  grouping  of  obs«'rvations  allows 
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the  individual  zero/one  response  observations  to  be  collapsed  into  a  proportion  for  the  group.  The 
zero/one  response  often  indicates  an  observation’s  possession  (one)  or  lack  (zero)  of  some  trait 
of  interest.  Grouping  observations  collapses  the  responses  into  a  single  measure  representing  the 
proportion  of  observations  possessing  the  trait.  The  logistic  function,  being  restricted  to  the  range 
between  zero  and  one,  is  ideally  suited  for  modeling  such  proportions  given  known  levels  of  the 
predictors.  Logistic  regression  is  thus  frequently  used  to  predict  the  proportion  of  individuals  in  a 
given  group  which  possess  the  trait  of  interest. 

A  general  form  of  the  logistic  model  is  expressed  in  Equation  21  (17:2.5-26). 


)r(i) 


1  +  e»(^)  ^ 


(21) 


where 


)r(i) 

s(*) 


a  response  vartabU  (ranging  from  0  to  1) 

some  function  of  the  predictor  variables  (linear  in  the  (3  parameters) 
the  model  error  terms. 


Note  that  the  logistic  model  is  not  a  linear  model  because  it  is  not  linear  in  the  /?  parameters 
which  would  be  contained  in  the  function  <7(z)  (The  function  g{i)  is  linear  b.owever.  This  fact  will 
be  used  later.) 

The  logistic  function  is  generally  S-.shaped  as  depicted  in  Figure  19  and  Figure  20.  These 
repre.sent  example  plots  of  logistic  functions  with  one  and  two  predictors,  respect  ively.  The  addition 
of  higher-order  and  interaction  terms  and  the  nature  of  the  relationship  between  the  variables  ran 
cause  the  logistic  function  to  take  on  shapes  other  than  the  standard  S-shape.  This  will  be  shown 
in  Chapter  4. 
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As  mentioned,  the  logistic  model  is  not  linear  in  its  original  form.  But,  it  can  be  linearized 


using  the  logit  transformation.  The  logit  transformation  is  shown  in  Equation  22. 


Logit  of  n(x)  =  (22) 

Transforming  the  response  (a  proportion)  through  the  logit  transformation  allows  the  logistic 
function  to  be  written  in  linear  form  as  in  Equation  23.  The  linearized  logistic  function  is  called 
the  logtt  response  function  (28:583). 


ln{ 


1  -  tr(x) 


)  =  9{x)  +  < 


(23) 


where 


7r(x) 

s(*) 


a  response  variable  (ranging  from  0  to  I) 
the  logtt  of  the  response,  x(x) 

some  function  of  the  predictor  variables  (linear  in  the  coefficients) 
the  model  error  terms. 


Although  the  logistic  model  can  be  expressed  as  a  linear  model,  standard  linear  regression 
cannot  be  applied  if  the  response  data  •'re  origin,  lly  binary  and  the  analyst  wishes  to  apply  standard 
linear  regression  inferential  statistics.  Recall  from  the  linear  regression  discussion  in  Section  2.1.1 
that  application  of  the  linear  regression  inferential  statistics  requires  the  model  assumption  that 
the  error  terms,  c,  are  distributed  N{0.cr-).  It  so  happens  when  the  response  data  are  origi.ially 
binary,  th.;  error  terms  are  not  normally  distributed,  but  binomially  distributed  (17:7)  .Also,  there 
is  nonconstant  error  variance  {heteroscedasticity)  across  varying  levels  of  the  predictors  (28:581). 
These  facts  indicate  that  ordinary  least  squares  estimation  of  the  model  parameters  is  inappropriate. 
When  there  are  a  sufficient  number  of  repeat  observations  at  each  level  of  the  predictors,  the 
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parameter  estimates  can  be  obtained  via  weighted  least  squares  (28:584-589).  Otherwise,  the 
parameters  may  be  estimated  using  maximum  likelihood  estimation  (28:589-595). 

As  a  review,  logistic  regression  is  often  applied  when  the  response  data  are  binary.  And,  the 
logistic  regression  model  is  characterized  by  three  properties: 

1.  Nonnormal  error  terms 

2.  Nonconstant  error  variance 

3.  A  constrained  response  function  (between  zero  and  one)  (28:580-581) 

These  properties  make  tlie  use  of  ordinary  least  squares  estimation  of  the  parameters  inappropriate. 

The  above  discussion  of  logistic  regression  assumes  that  the  response  data  are  originally  bi¬ 
nary,  zero  or  one,  data.  If  the  response  data  are  proportions,  but  not  derived  from  binary  data, 
an  adaption  of  logistic  regression  is  possible  (see  Reference  (5)).  Productive  capacity,  formulated 
as  <*/<,  is  one  such  proportion  which  may  be  modeled  with  the  adaption  of  the  logistic  regression 
model.  When  the  proportional  response  data  are  not  derived  from  binary  data,  the  logistic  re¬ 
gression  model  is  not  necessarily  characterized  by  nonnormal  error  terms  and  nonconstant  error 
variance.  This  means  that  estimation  of  model  parameters  through  ordinary  least  squares  esti¬ 
mation  may  be  possible.  There  is  of  course  the  requirement  to  check  the  linear  regression  model 
assumptions.  Thus,  the  adaption  of  the  logistic  regression  model  to  the  nonbinary  response  case 
involves: 

1.  Use  of  the  logistic  function 

2.  Linearization  of  the  lc;gistic  function  through  creation  of  the  logit  response  fiinriion 

3.  Estimation  of  the  model  parameters  using  ordinary  least  squares 

4.  Aptness  anaiy.sis  to  check  normality  of  error  terms 

This  adapted  logistic  regression  model  was  used  to  model  PC  in  this  thesis. 
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Although  primary  iniorcst  wa.s  in  predicting  the  aggregate  or  overall  PC  measures,  the  t;tsk- 
level  regressions  were  run  as  a  screening  exercise  to  identify  any  trends  in  the  relationships  between 
the  predictors  and  PC  across  tasks.  This  was  to  provide  insight  as  to  whether  tlie  number  of 
predictor  terms  might  be  reduced  (the  second  phase  of  the  iriodel  building  process),  The  adapted 
logistic  model  that  was  fit  was  a  full  second-order  model  to  include  aptitude/experience  interaction 
terms.  A  full  second-order  model  was  selected  to  account  for  any  curvature  or  interaction  effects 
that  may  not  have  been  accounted  for  with  a  first-order  model.  The  model  that  was  fit  at  the  task 
and  aggregate  level  can  be  found  in  Equation  24. 


-/to  f 

P(^  —  II  ■  '  I  '■  ■■  I,-.,.--.,,.,-.  4.  f 


where 

PC 

=  productive  capacity 

X\ 

=  ASVAB  Mechanical  percentile  score 

X2 

=  months  of  job  experience 

00, 01, 02-  03i  0A-  0s 

=  parameters  to  be  estimated 

i 

=  model  error  terms. 

The  logistic  model  in  Equation  24  wa.s  written  as  the  linear  model  in  Equation  2.u.  Writing  the 
equation  in  this  fashion  (linear  in  the  parameters)  allowed  the  model  parameters  to  be  estimated 
using  least-squares  regression. 
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the  logit  of  producitte  capacity 
ASVAB  Mechanical  percentile  score 
months  of  job  czperience 
parameters  In  be  estimated 
the  model  error  terms. 


After  the  logistic  models  were  fit  to  the  50  tasks  and  the  aggregate  measures,  a  forward 
stepwise  regression  was  run  for  the  aggregate  case  (weighted  average  PC).  This  was  in  accordance 
with  the  second  phase  of  the  model  building  process,  reducing  the  number  of  predictor  variables. 

After  performing  the  stepwise  regression,  the  resulting  aggregate  model  was  subjected  to  an 
aptness  analysis  to  include  a  plot  of  residuals  vs.  predicted  values  and  a  normal  probability  plot.  In 
reference  to  Figure  4,  the  aptness  analysis  concerns  the  third  phase  of  the  model  building  process, 
model  refinement  and  selection. 

After  completing  the  regression  and  aptness  analyses,  predicted  PC  values  were  obtained  for 
the  aggregate  model  for  use  in  subsequent  correlation^  analyses.  Recall  that  the  logit  response 
function  in  Equation  25  yields  predicted  values  not  for  PC,  but  for  the  logit  of  PC.  As  a  result, 
predicted  PC  measures  were  derived  from  the  predicted  logits  using  Equation  26. 


^  ePc,„„, 

PC=  - - 

1  q.  fPC  (e,,i 


where 

PC’  =  predicted  prodvctit  e  capacity 

PC’iofit  —  the  predicted  logit  of  productive  capacity. 


(26) 


3-3. 4-3  Analysis  of  the  Aggregate  Measures.  Once  tlie  predicted  aggregate  PC 
measures  were  computed,  they  were  correlated  with  other  measures  of  job  performance  to  ii^clude 
JKT  (Job  Knowledge  Test)  scores,  GFC  (supervisor’s  global  PC  ratings),  and  MTPC  (mean  timed 
PC).  The  measure  of  correlation  was  r,  the  Pearson  correlation  coefficient. 

The  variable  MTPC  was  created  by  AL/HRM  using  the  PC  formulation  t/t' ,  an  inversion  of 
the  formulation  of  Carpenter  and  others  (21)  (7).  MTPC  was  computed  as  the  average  PC  across  a 
limited  number  of  tasks  where  the  t  values  where  derived  through  acttial  timing  of  tasks  as  opposed 
to  supervisor  estimation.  Because  MTPC  was  computed  from  t/t'  values,  higher  values  indicated 
lower  performance  levels.  This  means  that  a  negative  Pearson  correlation  coefficient  would  be 
expected  between  MTPC,  and  a  variable  whose  higher  values  indicate  better  performance. 

This  correlational  analysis  was  to  provide  insight  as  to  whether  the  aggregation  method, 
weighting  scheme  and  fitted  model  were  effective  in  capturing  an  individual’s  true  overall  PC. 
Significant  correlation  with  other  performance  measures  was  to  be  interpreted  as  evidence  that  the 
aggregation  method,  weighting  scheme  and  fitted  model  were  appropriate. 


Last,  the  fitted  logistic  response  surfaces  were  plotted  for  the  weighted  aggregate  variable  to 
provide  a  graphic  illustration  of  the  fitted  model.  Surfaces  were  plotted  for  the  entire  effective  range 
of  the  predictor  variables  Finally,  response  surfaces  rescaled  to  zero/one  space  (see  Equation  17) 
were  iilso  plotted  to  increase  the  interpretability  of  the  plots. 
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/  V.  Results 


The  preceding  chapter  specified  in  detail  exactly  what  steps  were  taken  to  meet  each  research 
objective.  This  chapter  discusses  the  results  of  applying  those  steps  and  offers  further  discussions 
on  the  significance  of  the  research  findings. 

4-1  Formulation  of  a  Productive  Capacity  Measure  from  Estimated  Task  Performance  Times. 

As  previously  indicated,  the  primary  response  data  used  in  the  analyses  were  the  supervisors’ 
estimates  of  the  subjects'  task  completion  times.  In  their  raw  form,  the  time  estimates  tended 
to  widely  vary  within  a  task,  sometimes  covering  an  unbelievable  range  of  values.  This  implied 
the  need  for  editing  of  the  raw  time  data  to  control  for  serious  outlying  cases.  Table  10  provides 
summary  statistics  for  the  raw  time  estimates,  illustrating  the  sometimes  extreme  variation  for  a 
task.  For  instance,  note  task  G171  which  shows  an  extremely  wide  range  of  values  for  the  raw 
estimated  times.  The  raw  times  ranged  from  a  minimum  of  one  to  a  maximum  of  2880  minutes.  It 
was  considered  highly  unlikely  that  the  true  range  of  times  is  so  variable.  This  led  to  the  editing 
as  described  in  Section  3.3. 1.2. 

After  the  edited  estimated  limes  were  computed,  the  task-level  PCs  were  computed  using  the 
f*/t  formulation  of  Carpenter  and  others  (5:21).  These  required  further  editing  and  rescaling  as 
described  in  Section  3. 3. 1.4.  Table  II  shows  the  summary  statistics  for  the  final  edited  and  rescaled 
task  PCs. 

For  the  final  edited  values  of  the  task  PCs,  the  means  ranged  from  .12  to  .50  acros.s  tasks. 
The  standard  deviations  for  the  tasks  ranged  between  .12  and  .22.  Note,  in  particular,  that  in  only 
two  cases  was  the  coefficient  of  variation,  CV ,  noticeably  greater  than  one  (for  task  1299  and  J3.32). 
This  is  a  very  general  indication  that  the  task-level  PC  data  are  not  highly  dispersed  relative  to  the 
task  means,  and  thus  are  probably  not  highly  influenced  by  extreme  outliers.  In  contrast,  l  able  10 
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Table  12.  Average  Percent  Time  Spent  and  Computed  Task  Weights  by  Duty  Area 


Duty 

Area 

Skill  Level 

Task 

Weight 

3 

5 

7 

E 

6% 

10% 

17% 

11 

F 

18% 

14% 

7% 

13 

G 

6% 

5% 

2% 

4.33 

H 

13% 

12% 

7% 

10.67 

1 

14% 

13% 

7% 

11.33 

J 

5% 

5% 

2% 

4 

L 

3% 

3% 

2% 

2.67 

M 

3%, 

4% 

2% 

3 

N 

11% 

8% 

3% 

7.33 

P 

8% 

6% 

4% 

6 

of  summary  statistics  of  the  raw  times  indicates  the  coefficient  of  variation  was  greater  than  one 
for  13  tasks.  It  is  thus  apparent  that  the  editing  was  effective  in  controlling  the  effects  of  outliers 
and  getting  the  variance  to  more  reasonable  levels. 


4-2  S^.lection  of  a  Task  Weighting  Scheme. 

After  computing  the  task-level  PCs,  it  was  possible  to  weight  them  according  to  the  weigfiting 
scheme  described  in  Section  3.3.2.  Recall  that  the  weighting  scheme  actually  applied  weights  to 
each  job  duty  area,  and  all  tasks  from  a  particular  duty  area  were  assigned  the  same  duty  area 
weight.  Further  recall  that  the  weights  were  based  on  the  relative  amount  of  time  airmen  spend 
doing  particular  types  of  tasks. 

Table  12  shows  the  average  percent  time  spent  on  each  represented  duty  area  broken  out  by 
each  represented  skill  level.  It  also  shows  the  computed  weights  by  duty  area.  Again,  the  weights 
were  an  average  of  the  average  percent  time  spent  across  skill  levels 


4-3  Aggregation  of  the  Task-  Le vel  Data  into  an  Overall  Productive  Capacity  Measure. 


Once  the  task-level  PC  measures  were  computed  and  the  task  weights  derived,  it  was  possible 
to  compute  the  aggregate  or  overall  PC  mea.sure.  Recall  that  aggregate  PC  was  defined  as  a 
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Table  13.  Aggregate  Prodactive  Capacity  Measures  Created 


Variable 

Description 

PGuWAV  j 

PCuw  avg 
P^^vavg 

Unweighted  average  of  the  final  edited  task-level  PCs  per  individual. 
Weighted  average  of  the  final  edited  task-level  PCs  per  individual. 
Predicted  value  of  the  unweighted  average,  PCuwavf 

Predicted  value  of  the  weighted  average,  PCuwavo- 

Table  14.  Summary  Statistics  for  the  Aggregate  Productive  Capacity  Measures 


Variable 

Mean 

S.D. 

Minimum 

Maximum 

POuutavg 

n^n 

■H 

.66 

POyiQvg 

.  .68 

P^uutavg 

169 

.32 

.14 

.36 

POtuavs 

169 

.32 

.15 

.37 

weighted  average  of  the  task-level  PCs  for  an  individual  (see  Figure  17).  Also  recall  that  a  simple 
unweighted  average  was  computed  for  comparative  purposes.  Table  13  provides  a  brief  description 
of  each  aggregate  variable  created,  for  further  reference. 


The  prtdicied  values  described  in  Table  13  were  obtained  from  the  estimated  regression  functions 
which  are  discussed  in  the  next  section. 

Table  14  provides  some  summary  statistics  and  Figure  21  provides  histograms  for  the  ag¬ 
gregate  variables  to  give  some  insight  into  their  distributions.  Also,  Table  15  shows  the  Pearson 
correlation  coefficient  between  the  weighted  and  unweighted  versions  of  the  variables. 


Table  15.  Correlation  Between  the  Weighted  and  Unweighted  Aggregate  Productive  Capacity 
Measures 


U  nwcighted 
Variable 

Weighted 

Variable 

Correlation 

Coefficient 

PC^uwavg 

PC^wavjt 

>  .99“ 

(n  =  201) 

P^uwavg 

P^wavg 

>  .99“ 

(n  =  169) 

Superscript  a  indicates  significance  at  the  o  s  .05  level. 


From  the  histograms  of  the  aggregate  variables  (see  Figure  21)  it  appears  that  the  predicted 
values  are  negatively  skewed.  This  is  a  reflection  of  both  the  shape  of  fitted  response  curves  and  the 
experience  and  aptitude  levels  of  the  sample.  In  oiner  words,  most  of  the  sample  possessed  levels 
of  the  predictor  variables  which  corresponded  to  the  higher  response  points  on  the  fitted  response 
surface.  Again,  the  reader  is  referred  to  the  next  section  for  discussion  of  the  fitted  response 
surfaces. 

The  summary  statistics  (see  Table  14)  and  the  Pearson  correlation  coefficients  between  the 
weighted  and  unweighted  versions  of  the  aggregate  variables  (see  Table  15)  indicated  that  the 
task  weighting  had  negligible  effects  on  the  aggregate  variables.  The  highly  similar  statistics  and 
correlation  coefficients  very  near  one  suggest  that  the  weighted  and  unweighted  versions  of  the 
variables  are  measuring  approximately  the  same  attributes.  Thus,  it  appears  that  the  weighting 
scheme  and  subsequent  weighted  averaging  were  ineffective  in  defining  overall  PC  beyond  what 
could  be  offered  by  simple  averaging. 


4-4  Development  oj  Prediction  Models. 

As  previously  mentioned,  full  second-order  logistic  model";  were  fit  to  PC  both  at  the  task 

level  and  at  the  aggregate  level.  Recall  that  the  logistic  model  was  linearized  through  formation  of 

the  logit  response  function,  and  the  parameters  were  then  estimated  through  ordinary  least  squares 

estimation.  Table  16  summarizes  the  results  of  the  logistic  model  regressions  for  the  tasks. 

Table  16:  Regression  Results  for  the  Full  Second-Order  Logistic  Models 
»t  the  Task-Luvel 


Ttble  16:  (continued) 


/?o 

01 

th 

3i 

Task 

(xlO-") 

(xl0'‘) 

Eo 

Apt^ 

Ej  0 

Apt  X  Eip 

Ezp 

GUI 

-4.61 

-3.34 

-.881 

■3.13 

1.06 

G179 

-4.19 

-3.48 

-3.91“ 

3.00 

4.86* 

III 

G181 

2.73 

10.3 

-1.76 

-  002 

-.144 

1.87 

.0.5 

H202 

-.100 

-1  98 

Mi" 

.022 

1.79 

.05 

H203 

-1.53* 

2,31 

.116 

.040* 

.07 

H209 

-1.80 

-1.42 

.646 

.027 

.012 

.02 

H2I5 

1.04 

-2.27“ 

-1.12 

1.65 

.05 

H236 

-9.23* 

-11.7‘ 

-2  24“ 

-.838 

.189* 

.037 

2.76“ 

.08 

H237 

-6.40‘ 

-4.46 

-.401 

-5.77* 

.048“ 

2.15* 

.06 

H238 

-2.92 

-2.58 

1.91“ 

•1.19 

.053 

-.007 

3.02“ 

.09 

1247 

-2.97 

-2.48 

-.940 

2.64 

.037 

H!!9 

.93 

KSI 

1248 

-4.03 

-4.23 

-1.69“ 

-.128 

.072 

B  9 

2.65* 

.08 

1251 

-1.06 

1.07 

-2.90“ 

2.93 

-.024 

Ti  d 

2.22* 

.07 

1255 

-4.67 

-3.45 

-2.93“ 

.757 

.060 

WTi ;  1 

3.54“ 

.10 

1260 

1.22 

3.25 

-2.04“ 

2.94 

-.051 

T«  ■ 

3.50“ 

.10 

1264 

-3.99 

-2.79 

-2.41“ 

-037 

.042 

.034* 

2.70* 

.08 

1275 

-1.25 

-1.63 

-1.49“ 

.867 

.021 

.016 

2.17* 

.06 

1233 

.083 

-.022 

-1.4S* 

1.88 

-.013 

.0C5 

.86 

1284 

-2.91 

-2.09 

-1.61“ 

3.32 

.023 

.002 

2.4?* 

.07 

1286 

-5.81‘ 

-6.74 

-1.29* 

-1.02 

.113 

Bi  1 

2.43“ 

.07 

1299 

-10.68“ 

-14.0‘ 

-3.29“ 

2.01 

.207* 

BI9 

2.76“ 

.08 

1300 

-3.46 

-3.81 

-.733 

2.50 

.059 

-.004 

1.77 

.05 

J332 

-9.22 

-2.47“ 

.428 

.144 

.031 

3.28“ 

.09 

2.79 

-1.10“ 

-2.12 

.029“ 

2.45* 

.07 

J347 

-2.25 

2.47 

-2.45“ 

-.140 

.037 

3.42* 

.10 

J355 

-2.08 

.160 

-1.59“ 

-1.26 

.012 

.032* 

1.88 

.06 

L406 

-5.34 

-5.55 

.089 

.091 

1.98* 

.06 

L421 

-3.28 

-2.88 

-  473 

,046 

.09 

L436 

3.33 

6.31 

1.01 

-.100 

.500 

.02 

L437 

-11.49“ 

-16.1“ 

-4.36 

.258“ 

3.17* 

.09 

3.83 

WBBM 

-.063 

1.08 

BE 

-2.88 

■IH 

5.04* 

■of  tH 

-5.11 

-5.65 

KSBI 

-1.07 

BwW 

1.05 

.03 

N475 

-6.93“ 

-1.17 

.970 

.012 

1.59 

Kg 

N477 

2.76 

6.45 

-2.10“ 

2.79 

2.87* 

N436 

-6.14 

-3  07 

-.130 

-10.3* 

083“ 

2.44“ 

N487 

-2.86 

-1.52 

-.817 

-1.71 

■fWH 

1.78 

.05 

N488 

2.49 

7.19 

-1.65“ 

-.295 

-.103 

1.36 

.04 

N494 

-3.22 

-2.40 

-.401 

-1.78 

By  fj 

1.77 

.05 

N503 

-3.95 

-1.40 

-.818 

-3,57 

,040 

.045* 

1.97* 

.06 

P549 

-4.15 

-5.02 

-1.04 

-.181 

■rm 

Oil 

.49 

.01 

P554 

-2.82 

-4.23 

-.371 

067 

m 

1.03 

Rn 

P555 

.632 

4.92 

-1.35“ 

-  481 

-.068 

tm 

2.09* 

B3 

Superscript  b  indicates  significance  at  the  u  =  10  level 
Superscript  a  indicates  significance  ar  the  «i  =  OS  level 
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The  regression  results  in  Table  16  for  the  t.-usks  indicate  sotne  consistent  results.  First,  the 
Rh  were  consistently  low.  ranging  from  .01  to  .13.  Second,  aptitude  did  not  appear  to  have  inuch 
influence  on  task  PCs  as  indicated  by  the  statistical  signiflcance  of  the  corresponding  parameters. 
The  associated  aptitude  coefficients,  /?i,  /J4,  or  both,  tested  significantly  different  from  zero  at  the 
a  =  .05  level  for  only  two  tasks.  Third,  experience  seemed  to  be  more  strongly  related  to  PC  with 
either  P2,  or  both,  testing  significantly  different  from  zero  at  the  q  =  .0.5  level  for  33  of  50  tasks. 
It  is  important  to  note  that  the  aptitude/experience  interaction  coefficient,  /?3,  tested  significantly 
different  from  zero  for  only  four  tasks.  Overall,  these  re  ults  were  in  partial  agreement  with  those 
of  Schmidt  and  others  and  Alley  and  others  (1)  (36)  .  They  also  found  that  there  does  not  appear 
to  be  an  aptitude/experience  interaction  affecting  job  performance.  However,  they  found  aptitude 
to  be  an  important  determinant  of  job  performance. 

Overall,  the  task-level  logistic  models  for  predicting  PC  did  not  perform  as  well  n.s  AL/HRM's 
task-level  learning  curve  models  for  predicting  untrarisformed  estimated  times  (38).  More  of  the 
learning  curve  models  (41  of  50)  were  significant  ana  they  yielded  generally  higher  K^s  (ranging 
from  .01  to  ,20).  But,  as  mentioned,  learning  curve  models  are  only  useful  for  determining  how  fast 
a  piece  of  wor..  can  be  completed  given  the  worker’s  aptitude  and  experience  level.  A  transformation 
must  still  be  applied  to  the  time  data  to  provide  a  standardized,  interpretable  work  output  measure 
like  PC.  A  second  drawback  of  the  learning  curve  model  is  that  it  is  difficult  to  develop  a  meaningful 
model  for  predicting  overall  performance  measures  w’hen  the  appropriate  level  of  job  specificity  for 
data  collection  is  the  task  level.  There  seems  to  be  no  meaningful  way  of  aggregating  task-level 
performance  times  into  an  overall  measure  that  could  be  predicted. 

Because  of  the  large  number  of  tasks  studied,  detailed  residual  analyses  to  check  model 
aptness  were  not  performed  at  the  task  level.  An  aptness  analysis  was  performed  for  the  model  for 
predicting  the  aggregate  measure. 
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Table  17.  Regression  Results  for  the  Full  Seoond-Ordcr  Logistic  Model  at  the  Aggregate  Level 


Variable 

0) 

inicrerpt 

d, 

(X  lO--*) 
Apt' 

02 

(xlO-^) 

Exp- 

03 

(xlO-") 
Apt  X  Exp 

04 

Apt 

03 

Exp 

Fo 

R^- 

PC'xvavg 

-2.97“ 

-2.10 

-1.21“ 

-.010 

.040 

.023“ 

0.07“ 

1  .IG 

Superscript  a  indicaN's  siguiHrance  at  the  o  =  .05  level. 


Finally,  in  reference  to  the  task-level  models,  recall  that  they  were  run  primarily  as  a  screening 
exercise  to  provide  insight  as  to  whether  any  model  terms  from  the  second-order  model  could  be 
dropped.  The  task-level  models  obviously  indicated  that  the  terms  including  the  aptitude  variable 
were  potential  catididates  for  removal  from  the  model.  The  aggregate  model  was  analyzed,  in  part, 
to  further  explore  this  possibility.  j 

Table  17  provides  the  regression  results  for  the  aggregate  variable,  PCwavjt  regressed  on  apti¬ 
tude  and  experience  using  the  linearized  logit  rcspon.se  function.  Table  17  contains  some  interesting 

results.  In  predicting  th<>  aggregate  measure,  experience  seemed  to  be  an  influencing  factor.  This 

! 

was  indicated  by  /dj  I'otd'  testing  significantly  different  from  zero.  The  aptitude  coefficients, 

01  and  04,  tested  not  significantly  different  from  zero.  Overall,  the  results  of  the  aggregate  model 

paralleled  the  results  of  the  task  models  in  that  experience!  was  an  influencing  factor,  but  aptitude 

I 

and  the  aptitude/experience  interaction  were  not.  These i  results  again  are  in  partial  agreement 

I 

with  those  of  Schmidt  and  others  and  Alley  and  others  (1)  ,(3(3).  They  found  no  interaction  effects 
but  in  contrast,  they  did  find  significant  aptitude  effects.  In  compari.son  to  the  Air  Force's  other 
PC  studies,  the  aggregate  model  /?'  was  comparable  to  tho.se  found  for  the  AGE  specialty  by  Fa- 
neuff  and  others  (13:10).  They  reported  R^s  of  .17  and  .20  using  the  .ASV..\B  fC  and  M  scores  as 
aptitude  variables,  respectively.  But.  the  R's  of  the  current  .study  were  much  lower  than  that  for 
the  aggregate  model  of  Carpenter  and  others  (/?'-  =  .44)  for  specially  32!^.\0  (.'j:22). 

As  mentioned,  the  results  of  the  regression  using  the  full  second-order  logit  p’sponse  model 
for  PC'riiovs  showed  that  none  of  the  parameters  for  terms  which  included  the  aptitude  nK'asnro 
tested  significantly  different  from  zero.  This  was  further  indication  that  the  aptitude  predictor  w.ts 
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Table  18.  Forward  Stepwise  Regression  Results  for  the  Second-Order  Logistic  Model  at  the  Ag¬ 
gregate  Level 


Superscript  a  indicates  significance  at  the  a  =  .05  level 

Table  19.  ANOVA  Table  for  the  Aggregate  Productive  Capacity  Mea.sure  after  Forward  Stepwise 
Regression 


Source  of 
Variation 

SS 

df 

MS 

Fo 

Regression 

5.73 

2 

2.87 

12.69“ 

Error 

37.50 

166 

.23 

Total 

43.23 

168 

Superscript  a  indicates  significance  at  the  a  =  05  level 

a  candidate  for  removal  from  the  model.  A  forward  stepwise  regression  was  then  run  beginning  with 
the  full  second-order  model  to  determine  if  the  aptitude  terms  could  be  dropped.  The  criterion  for 
a  term’s  entry  into  the  model  was  F  statistic  significance  at  the  a  =  .05  level.  The  same  criterion 
was  used  for  a  term's  departure  from  the  model.  Tlie  forward  stepwise  regression  did  in  fact  drop  ail 
terms  involving  the  aptitude  variable  from  the  model.  Table  18  provides  the  results  of  the  stepwdse 
regression  and  Table  19  provides  the  final  ANOVA  table  . 

The  model  after  stepwise  regression  was  selected  as  the  final  model,  provided  that  it  would 
meet  the  linear  model  eissumption  of  normality  of  error  terms  (c  ~  A’(0,cr‘)).  Figure  22  provides 
the  results  of  an  aptness  analysis  for  the  final  model  to  clieck  tlie  normality  assumption.  The  figure 
includes  a  plot  of  the  model  residuals  vs.  fitted  values  a  normal  probability  plot  of  the  residuals. 

The  top  plot  in  Figure  22,  a  plot  of  the  residuals  vs.  the  fitted  values,  shows  a  fairly  even 
band  of  points  around  the  zero-residual  line.  This  indicated  that  the  variance  of  the  residuals  and 
thus  the  variance  of  the  actual  error  terms  is  fairly  constant  acro.ss  differing  levels  of  the  predicted 
values.  This  homoscedasticity  is.  of  course,  desirable.  If  the  error  variance  was  not  constant  across 
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Table  20.  Correlation  Between  the  Aggregate  Productive  Capacity  Measures  and  Other  Job  Per¬ 
formance  Measures 


JKT 

GPC 

MTPC 

.08 

(n  =  193) 

.44“ 

(n  =  199) 

-.13 

(n  =  5S) 

PC 

*  ^  u?  rtf  g 

.22“ 

(n  =  169) 

.25“ 

(n  =  167) 

-.27‘ 

(,1  =  51; 

Sup<r&cnpt  a  indicates  significance  at  the  a  =:  .05  level. 
Superscript  b  indicates  significance  at  the  a  =  .10  level. 


the  different  levels  of  the  fittrd  values,  then  the  model  would  not  be  appropriate  for  the  fitted  region 
since  it  is  assumed  that  the  error  terms,  c,  are  distributed  N(p,(T~).  The  bottom  plot,  a  normal 
probability  plot,  shows  a  high  degree  of  linearity  toward  the  center  of  the  data  with  a  few  outliers  at 
each  end  showing  clear  nonlinearity.  Linearity  is  desirable  because  it  indicates  the  actual  residuals 
and  their  expected  values  under  the  normal  assumption  are  highly  correlated.  Linearity  implies 
normality  of  the  residuals  and  thus  normality  of  the  model  error  terms.  The  nonlinearity  at  the  ends 
of  the  plot  was  not  overly  worrisome  since  it  is  due  to  a  relatively  small  number  of  outlying  points. 
Overall,  the  conclusion  wfis  that  the  fitted  second-order  logistic  model  with  aptitude  excluded  was 
appropriate  for  the  data. 

4-4  ^  Correlational  Analysts  of  the  Estimated  Model  Results.  After  the  aptness  analysis, 
predicted  values  of  PC^avg  (PC„,avf)  were  obtained  from  the  final  fitted  model.  As  part  of  the 
model  as.sessment,  it  was  determined  that  the  correlation  between  the  final  model’s  predicted  values 
and  other  job  performance  measures  would  offer  insight  as  to  the  model’s  effectiveness.  Table  20 
shows  the  correlation  b<'t\veen  the  computed  and  predicted  aggregate  variables  and  other  previously 
defined  job  performance  measures  collected  under  the  Productive  Capacity  Project. 

Table  20  indicates  that  the  aggregate  variable  computed  from  the  task-level  PC  measures, 
PCu.avg,  correlated  more  highly  with  GPC  than  did  the  associated  predicted  variable.  PC,,,,,,,,. 
This  is  not  terribly  surprising  since  GPC,  like  PCviaeg-  is  also  the  result  of  supervisor  estimation. 


Table  21.  Correlation  Matri.x  of  the  Other  Job  Performance  Measures 


JKT 

GPC 

MTPC 

JKT 

1.0" 

(n  =  196) 

GPC 

.12 

(n  =  191) 

1.0" 

(n  =  199) 

MTPC 

-.4-1“ 

(n  =  60) 

-.18 

(n  =  57) 

1.0" 

(n  =  60) 

Superscript  a  indicates  significance  at  the  a  =  .05  level. 


The  predicted  values,  PCuiavg,  correlated  more  strongly  with  the  objectively-derived  measures, 
JKT  and  MTPC. 

There  seemed  to  be  a  pattern  of  higher  correlation  between  the  predicted  values  and  the  more 
objective  measures.  A  similar  pattern  existed  between  the  computed  average  measure,  and 

the  more  subjective  measure,  GPC.  This  seemed  to  indicate  that  the  subjectively-derived  measures 
are  measuring  different  dimensions  of  performance  than  the  predicted  variable  and  .he  objective 
variables. 

One  final  noteworthy  finding  is  the  relatively  low  correlation  between  mean  PC  derived  from 
actual  stopwatch  times  (MTPC)  and  computed  average  PC  derived  from  supervisor  estimates 
{PCu/avg)-  This  is  an  indication  that  the  supervisors’  ratings  may  be  measuring  a  different  dimen¬ 
sion  of  performance  than  the  actual  stopwatch  times,  or  a  great  deal  of  noise  resulting  from  rating 
biases  of  the  supervi.sors. 

To  summarize  the  results  of  the  correlational  analysis,  PC^^avg  correlated  more  strongly  will 
GPC  than  did  the  associated  predicted  values  This  seemed  to  indicate  that  predicted  values,  and^ 
thus  the  model,  captured  less  of  global  PC  {a.s  judged  by  the  supervisors  in  their  GPC  ratings)  thanj 
the  comput'd  average  data.  This  may  be  an  indication  that  the  model  is  not  measuring  what  it  is 
supposed  to — overall  PC.  But,  on  the  other  hand,  the  predicted  values  did  correlate  more  highly 
with  the  other  objective  measures  indicating  that  the  model  is  predicting  job  performance  in  at 


least  one  respect.  The  assessment  of  the  model  through  correlational  analysis  thus  gives  conflicting 
results. 

Table  21  was  included  simply  to  give  the  reader  an  indication  of  how  the  other  job  performance 
measures  relate  to  one  another. 


Graphical  Representation  of  the  Estimated  Logistic  Models.  The  preceding  regre.s- 
sion  results  and  correlational  analyses  were  helpful  in  providing  insight  as  to  how  the  predictors 
potentially  influence  PC,  and  how  the  aggregate  variables  (computed  and  predicted)  relate  to  other 
job  performance  measures.  This  section  is  intended  to  provide  additional  insight  into  the  estimated 
model  by  providing  a  graphical  representation  of  the  fitted  models.  Figure  23  shows  the  fitted 
response  curve  for  the  final  model  to  provide  a  graphical  representation  of  the  relationship  between 
experience  and  PC.  It  is  plotted  over  the  effective  range  of  the  predictor,  experience  (one  to  IT*^ 
months).  Figure  24  shows  the  plotted  surface  for  the  full  second-order  model  prior  to  the  stepwise 
regression,  to  show  the  relatively  mild  effects  of  aptitude  and  interaction  on  estimated  PC.  Recall 
that  in  the  stepwise  regression,  the  aptitude  terms  were  dropped.  The  full  model  is  likewise  plotted 
over  the  effective  range  of  predictors,  aptitude  (M  score  45-99)  and  experience  (one  to  170  months). 

The  fitted  response  surfaces  were  obtained  by  entering  the  logistic  model  parameter  esti¬ 
mates  into  the  logistic  model  function.  Equation  27  shows  the  equation  for  the  final  model,  and 
Equation  28  shows  it  for  the  full  second-order  model. 


1  ^ 


(27) 


where 
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X2 


00 

02 

00 


predicted  weighted  average  productive  capacity 

from  the  final  model  after  the  stepwise  procedure 

months  of  job  experience 

-1.231482 

-.000131 

.019038. 
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where 
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Xl 


00 

01 

02 

03 

0A 

00 


predicted  weighted  average  productive  capacity 

from  the  full  second-order  model 

ASVAB  Mechanical  percentile  score 

months  of  job  experience 

-2.969180 

-.000210 

-.000124 

-.000061 

.039573 

.022894. 


To  increase  the  interpretability  of  the  fitted  response  curve  and  surface,  the  entire  surfaces 
were  rescaled  to  zero/one  space  much  like  the  edited  task-level  PC  values.  This  was  to  ensure 
a  minimum  predicted  PC  of  zero  and  a  ma.ximum  of  one  so  that  PC  could  be  interpreted  as  a 
proportion  of  ma.xinuini  possible  output.  Equation  29  mathematically  shows  the  e(|uation  for  the 
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rescaled  surfaces.  The  rescaled  response  surfaces  are  shown  in  Figure  25  and  Figure  26  for  the  final 
and  full  models,  respectively. 


PC, 


PCviavf  ~  PCa, 

PCwavg^,,  -  PCwavg^,, 


(29) 


where 

PCwavgK,,c,i,4  —  reseated  predteied  average  mean  produetive  eapaeity 
PCwavg„„  —  Afinimum  value  of  PC^iavg 

PCuiavg„,,  =  Maximum  value  of  PCa,avg- 

The  plotted  response  curves  and  surfaces  all  show  PC  initially  increasing  with  experience 
until  it  reaches  a  maximum,  and  then  begins  to  steadily  decrease.  In  the  case  of  the  plotted  surface 
for  the  full  second  order  model,  this  is  shown  to  occur  at  all  levels  of  aptitude.  The  plots  for  the 
full  model  also  show  PC  generally  increasing  with  aptitude  at  all  levels  of  experience.  There  .foes 
appear  to  be  a  peak  and  a  slight  decrease  in  PC  with  increasing  aptitude.  Once  again,  in  reference 
to  the  plot  of  the  full  model,  very  little  interaction  was  present,  as  indicated  by  the  fairly  constant 
effects  of  one  predictor  with  varying  levels  of  the  other. 

Before  drawing  conclu.sions,  it  is  important  to  recall  that  the  models  did  not  fit  the  data  very 
well  (for  thelfull  model  =  .16,  and  for  the  final  model  R-  =  .13).  Also,  recall  from  Table  9, 
the  two-way  ^jJistribution  of  aptitude  and  experience,  that  there  were  relatively  few  data  points 
indicating  exberience  beyond  96  months.  The  model  must  thus  be  interpreted  cautiously  beyond 
this  point.  These  two  facts  suggest  that  the  response  curves  and  surfaces  should  not  be  viewed 
with  exactness,  but  in  general  terms.  They  should  serve  only  to  provide  some  possible  insight  as 
to  how  the  factors  might  effect  eachother. 


The  decreasing  PC  with  increasing  experience  over  a  portion  of  the  curves  and  surfaces  was 
an  unexpected  result.  This  seemed  to  indicate  that  there  is  some  point  in  an  airman's  career 
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Figure  25.  Rescaled  Fitted  Response  Curve  for  Productive  Capacity  Over  the  Effective  Range  of 
E.xperience 


lOS 


/ 


where  he  or  she  may  begin  to  experience  skill  degradation,  or  decreasing  PC  or.  the  types  of 
tasks  studied.  The  estimated  logistic  function  for  the  final  model  was  put  into  GINO  (General 
/Meractive  Optirviizer)  to  identify  exactly  where  the  niaximi'm  PC  point  on  the  curve  occurs,  and 
thus  where  the  performance  degradation  may  begin  (22).  Maximum  PC  occurs  on  the  surface  at 
about  71  months  of  job  experience.  This  seemed  to  indicate  that  after  approximately  six  years  of 
job  experience,  the  capacity  to  perform  hands-on  production-type  job  tasks  decreases  for  454X1 
personnel.  This  result  might  be  explained  by  tlie  fact  that  Air  Force  enlisted  personnel  typically 
begin  to  make  the  transition  into  supervisory  roles  at  around  the  six  year  point.  This  means  they 
begin  to  spend  less  time  practicing  production-type  tasks  so  skill  degradation  might  reasonably  be 
expected.  This  is  not  to  say  that  an  airman's  overall  performance  decrecises  after  the  six  year  point, 
only  performance  on  the  types  of  tasks  studied  under  the  Productive  Capacity  Project.  Hands-on 
performance  on  such  types  of  tasks  becomes  decreasingly  important  as  airmen  advance  in  grade 
and  move  on  to  supervisory  roles.  A  more  appropriate  nreasure  of  performance  for  more  senior 
members  would  most  likely  have  to  include  measures  of  their  ability  to  supervisor. 

A  final  point  to  be  made  concerning  Figure  24  and  Figure  26  for  the  full  model  is  that 
maximum  PC  occurred  near  an  aptitude  score  of  84,  according  to  GINO.  In  looking  at  the  plotted 
surfaces,  there  does  not  seem  to  be  a  sigriiii''ant  decrease  in  performance  beyond  this  score.  There 
is  simply  no  strong  indication  that  PC  truly  does  peak  and  then  decrease  with  increasing  aptitude. 
Since  the  model  provided  significantly  less  than  a  perfect  fit,  it  may  simply  be  enough  to  note  that 
PC  tends  to  increase  with  aptitude,  in  general,  at  all  levels  of  experience. 


110 


V.  Smmnary,  Conclusions  and  Recommendations 

Recognizing  that  the  Air  Force  could  greatly  benefit  from  acquiring  the  ability  to  forecast 
the  future  job  performance  of  its  personriel,  this  research  effort  set  out  to  develop  experimental, 
descriptive  regression  models  for  predicting  the  job  performance  of  personnel  in  specialty  454X1, 
Aerospace  Ground  Equipment.  Hopefully,  this  modeling  activity  will  serve  to  help  Air  Force  plan¬ 
ners  take  another  step  in  their  iterative  and  on-going  quest  for  adequate  job  performance  models. 

The  research  objectives,  as  presented  in  Chapter  1,  were  as  follows: 

1.  Formulate  a  Productive  Capacity  Measure  from  Estimated  Task  Performance  Times 

2.  Select  a  Task  Weighting  Scheme 

3.  Aggregate  the  Task-Level  Data  into  an  Overall  Productive  Capacity  Measure 

4.  Develop  Prediction  Models 

Data  for  the  analyses  were  collected  by  the  Air  Force  under  its  Productive  Capacity  Project 
(21).  The  primary  dependent  (response)  variables  were  raw  estimated  task  performance  times  for 
airmen  in  specialty  454X1,  and  the  independent  (predictor)  variables  were  mechanical  aptitude  and 
job  experience. 

The  following  sections  provide  a  brief  recapitulation  of  the  research  methods  used  to  meet 
these  research  objectives  and  contain  a  summary  of  conclusions  and  recommendations  for  further 
research . 

5.1  Stimmary  and  Conclusions. 

5.1.1  Formulating  a  Productivt  Capacity  Mrosvrt  from  Estimated  Task  Performance  Times. 
The  primary  response  data  analyzed  were  raw  estimated  task  performance  times  for  204  airmen 
in  specialty  454X1,  Aerospace  Ground  Equipment.  The  estimated  times  were  provided  by  the 
airmens’  supervisors  for  50  job  tasks  commonly  performed  by  personnel  in  the  specialty.  An  initial 


research  objective  was  to  determine  how  to  transform  the  task-level  time  data  in  o  measures  tliat 
are  interpretable  and  able  to  be  aggregated  across  tasks.  At  the  task  level,  an  interpretable  measure, 
PC,  was  formulated  according  to  the  method  proposed  by  Carpenter  and  others,  C/<  (5:21).  In  the 
formulation,  t"  repre.sented  an  estimate  of  the  fastest  possible  time  in  which  a  given  task  could  be 
completed,  and  t  represented  the  estimated  time  for  an  airman  to  complete  that  tcisk.  The  measure 
can  be  interpreted  as  an  individual’s  output  as  a  proportion  of  maximum  possible  output. 

Several  considerations  had  to  be  accounted  for  in  computing  task-level  PC.  Most  importantly, 
the  raw  estimated  performance  times  from  which  the  PC  measures  were  derived  tended  to  be  highly 
variable  with  an  often  unbelievable  range  of  values  within  a  task.  This  indicated  a  need  for  editing 
to  control  for  influential  outliers.  As  a  result,  several  stages  of  data  editing  were  applied  to  the  taw 
estimated  times  and  to  the  computed  PCs  to  obtain  reasonable  distributions  of  the  task-level  !'Cs. 

5.1.2  SeUcUng  a  Task  WeighUng  Scheme.  Since  PC  is  a  quantity-based  measure  of  work 
output  capability,  it  seemed  appropriate  to  weight  the  trsks  by  the  relative  amount  of  time  a  rmen 
spend  doing  them  on  average.  This  was  to  account  for  the  fact  that  airmen  may  spend  varying 
amounts  of  time  on  rlifferent  tasks,  some  of  which  they  are  productive  on,  and  some  on  which  they 
are  not.  Tasks  were  weighted  by  a  factor  derived  through  averaging  Average  Percent  Time  Spent 
Performing  Duties  data  (collected  by  the  Occupational  Measurement  Squadron)  across  relevant 
skill  levels.  Duty  area  weights  were  applied  to  teisks  from  that  area.  Greater  weight  went  to  those 
tasks  performed  most  frequently. 

The  applied  %veighting  scheme  had  little  effect  on  the  computed  aggregate  variables.  The 
weighted  average  measures,  when  compared  to  their  unweighted  counterparts,  had  highly  similar 
descriptive  statistics.  The  weighted  and  unweighted  versions  of  the  variables  were  also  highly  cor¬ 
related.  The  conclusion  is  that  the  applied  weighting  scheme  had  no  noticeable  effect  on  aggregate 
PC. 
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5.1.3  Aggregating  the  Task-Level  Data  into  an  Overall  Productive  Capacity  Measure.  At 
the  overall  or  aggregate  level,  PC  was  defined  and  computed  as  a  weighted  average  of  the  task- 
level  PC  values  for  each  experimental  subject.  Along  with  weighted  averaging,  the  task-level  PC 
data  were  aggregated  through  simple,  unweighted  averaging  for  comparison  purposes.  The  need 
for  aggregation  existed  because  overall  measures  are  of  more  importance  in  the  bigger  scheme  of 
manpower  modeling  and  planning.  Task-level  information  is  important  but  manpower  decisions 
usually  cannot  be  made  based  on  an  individual’s  predicted  performance  on  single  tasks.  Also, 
modeling  at  the  task  level  for  each  of  the  approximately  250  AFSs  would  simply  be  too  cumbersome. 
Because  jobs  tend  to  be  multifaceted  and  dynamic,  and  because  task-level  modeling  is  potentially 
too  burdensome,  it  was  desirable  to  compute  and  model  an  aggregate  measure. 

5.1.4  Developing  Prediction  Models.  Both  task-level  and  aggregate  PC  measures  were 
regressed  on  aptitude  and  job  experience  using  a  second-order  logistic  model.  The  aptitude  variable 
used  was  the  Mechanical  percentile  score  from  the  ASVAB  obtained  by  each  subject  upon  applying 
for  enlistment.  The  experience  variable  used  was  the  subjects’  self-reported  job  experience  at  the 
time  the  estimated  times  were  collected. 

At  the  task-level,  R^s  were  consistently  low  for  the  logistic  model,  ranging  from  .01  to  .13, 
This  may  indicate  that  there  are  other  predictor  variables  influencing  PC  that  were  not  addressed 
in  this  thesis.  Another  possible  explanation  of  the  low  R'^s  is  that  the  assumption  of  validity  and 
reliability  of  the  PC  data  collection  instrument  and  method  is  not  sound.  Supervisors  are  known 
to  be  subject  to  many  types  of  biases  which  affect  their  judgements  concerning  the  performance  of 
their  personnel  (6:82-84).  The  low  R-s  may  be  indicative  of  the  fact  the  supervisors  are  introducing 
noise  into  the  data  from  such  biases,  and  thus  adversely  affecting  validity  and  reliability,  and  thus 
model  fit. 

Residual  analysis  of  the  aggregate  logistic  model  indicated  that  it  was  reasonably  appropriate 
for  the  data.  The  model  for  predicting  the  aggregate  measure  yielded  results  that  were  comparable 


to  the  task-level  model  results.  Experience  seemed  to  be  a  significant  predictor  while  aptitude 
and  the  aptitude/experience  interaction  did  not.  The  model  R~  for  the  full  second-order  aggregate 
logistic  model  was  .16.  The  full  second-order  model  was  subjected  to  forward  stepwise  regression 
which  indicated  that  all  terms  involving  the  aptitude  variable  could  be  dropped  from  the  model. 
The  final  model  involved  a  constant  intercept  term  and  linear  and  quadratic  experience  terms.  The 
final  model  yielded  an  of  .13. 

After  the  logistic  model  parameters  were  estimated,  predicted  PC  values  were  computed  for 
the  aggregate  measure.  These  were  correlated  wi.h  other  subjective  and  objective  job  performance 
measures  collected  under  the  Productive  Capacity  Project.  The  predicted  values  showed  correla¬ 
tions  significantly  different  from  zero  for  each  measure. 

Fitted  response  surfaces  for  the  estimated  aggregate  models  were  plotted  and  they  indicated 
a  pronounced  peak  for  PC  with  respect  to  experience.  There  was  some  evidence  that  PC  may 
begin  to  decrease  for  AGE  personnel  after  about  the  six  year  point  in  their  career.  This  may  be 
reflective  of  skill  degradation  which  may  occur  as  airmen  lose  practice  on  hands-on  type  work  as 
the  transition  tc  supervisory  roles  is  made.  It  may  also  be  the  result  of  having  only  a  few  data 
points  for  higher  levels  of  experience,  or  it  may  be  simply  an  artifact  of  the  relatively  low  degree 
of  model  fit. 


Overall,  the  level  of  model  fit  (R‘)  tended  to  be  low,  but  comparable  to  that  found  for  similar 
studies  (5)  (13)  (38).  R}s  of  the  current  magnitude  indicate  that  more  work  nrust  be  done  to  create 
rnorelrobust  prediction  models. 


o.i 


itcommendations. 


he  previou.'i  section  provided  a  brief  summary  of  the  research  objectives,  methodology  and 
finding].  It  did  not,  however,  discuss  the  additional  research  questions  which  arose  during  the  effort . 
As  mentioned  in  the  first  chapter,  exploratory  or  descriptive  research  such  as  this  often  spawns  as 
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many  research  questions  as  it  answers.  This  section  will  address  some  of  the  issues  which  came  to 
the  forefront  in  the  current  effort.  These  issues  will  be  discussed  in  the  context  of  recommendations 
foi  further  research. 

5.2.1  Formulahng  the  Producitve  Capacity  Meaxure.  The  current  emphasis  in  Air 

Force  job  performance  measurement  is  on  using  performance  tim(:  data  in  deriving  job  perforinanre 
measures.  This  is  because  most  Air  Force  manpower  modeling  and  planning  involves  performance 
criteria  such  as  .xorite  generation  rates  and  mean  time  to  repair  aircraft.  Such  measures  are  quantity- 
based  and  therefore  indicate  the  need  to  assess  and  predict  work  output  referenced  to  time.  As 
a  result,  the  Air  Force  is  researching  cost-effective  methods  for  obtaining  work  performance  time 
data. 

As  indicated  in  the  current  analyses,  the  current  method  of  obtaining  the  performance  time 
estimates  (through  free  response  supervisor  estimation)  yielded  ranges  of  values  which  were  exces¬ 
sively  wide  (see  Table  10).  This  may  be  due  in  part  to  that  fact  that  supervisors  provided  their 
estimates  in  a  virtual  free  response  format.  This  means  that  they  were  unconstrained  in  reference 
to  the  estimates  they  could  make.  In  future  studies,  it  is  recommended  that  supervisors  be  forced 
to  limit  their  time  estimates  to  a  pre-established  reasonable  range.  A  reasonable  range  of  e.stimates 
could  be  derived  using  SMEs,  much  like  Leighton  and  others  used  SMEs  to  develop  benchmark 
times  (21). 

Other  recommendations  involve  the  formulation  of  PC  measures  from  the  ta.sk  performance 
time  data.  One  potential  problem  with  creating  PC  measures  according  to  the  Car|)enler  and 
others  formulation,  C/t,  is  that  the  computed  task-level  PCs  for  each  individual  are  in  pari  based 
on  the  single  task-level  measure  t’.  Since  the  computed  PCs  are  based  on  them,  rare  must  be  laken 
to  obtain  t*  values  that  are  accurate  so  that  the  resulting  PC  values  are  properly  interpret  able.  In 
the  current  research,  it  was  pointed  out  that  the  raw  estimated  times,  t,  for  each  task  tended  to 
be  highly  variable  indicating  inconsistencies  in  the  supervisors’  opinions  about  what  a  reasonable 


range  of  performance  should  be.  This  places  some  doubt  in  the  accuracy  of  the  estimated  times, 
especially  those  near  the  fast  and  slow  end  of  the  estimated  range.  Since  t"  was  computed  as 
the  .99x  (mintmum  esttmated  time  for  the  task  (after  editing)),  there  is  some  question  as  to  the 
credibility  of  such  t‘  values.  An  appropriate  way  to  address  this  problem  may  be  to  compute  the 
PC  measure  as  in  time  studies  where  PC  is  computed  as  (taev/O  x  100.  Computed  in  this  fashion, 
PC  for  a  task  is  not  dependent  on  a  single  measure  f",  but  on  the  task  average,  tavf 

Also,  it  is  important  to  note  that  Carpenter  and  others’  PC  formulation  is  not  a  linear 
transformation  of  the  time  variable,  /,  from  which  it  is  computed.  This  is  what  prompted  AL/HRM 
to  formulate  PC  as  tft',  an  inversion  of  the  Carpenter  and  others  formulation  (7).  A  nonlinear 
transformation  can  have  the  effect  of  influencing  the  degree  of  linear  relationship  between  a  variable 
and  another.  It  is  recommended  that  the  nonlinearity  introduced  by  the  Carpenter  and  others’ 
formulation  be  studied  to  determine  its  effect,  and  whether  a  linear  transformation  should  be 
considered  in  future  studies. 

5.2.8  Selecting  a  Task  Weighting  Scheme.  Because  of  the  nature  of  the  PC  measure 
(quantity-based),  it  is  recommended  that  relative  time  spent  measures  continue  to  be  considered 
as  a  weighting  factor.  The  PC  measure,  as  defined,  is  indicative  of  a  worker’s  output  relative  to 
some  standard.  In  the  current  effort,  that  standard  was  1’,  an  estimate  of  the  fastest  possible 
performance  time.  As  such,  the  PC  measure  at  the  task-level  must  somehow  be  given  different 
weights  reflective  of  how  often  the  tasks  are  performed.  This  is  so  that  an  aggregate  measure 
which  represents  an  airman's  actual  capacity  to  produce  (given  the  average  job  scenario)  can  be 
computed.  Recall  that  in  the  current  effort,  weights  were  derived  for  job  duly  areas  as  opposed  to 
individual  tasks.  This  was  due  to  the  unavailability  of  task-level  data.  It  is  recommended  that  an 
attempt  be  made  to  obtain  and  use  relative  time  .spent  data  derived  for  individual  tasks  as  opposed 
to  those  for  an  entire  duty  area.  This  will  further  differentiate  tasks  on  level  of  importance  and 
may  yield  a  more  tnoaningful  aggregate  PC  measure. 


lie 


5.2.S  Aggregating  the  Task-Level  Data  into  an  Overall  Productive  Capacity  Measure. 

One  problem  with  averaging  (both  weighted  and  unweighted)  task-level  measures  is  that  there  is 
significant  information  loss.  In  this  thesis  for  instance,  the  actual  response  data  for  each  individual 
was  a  row  vector  of  about  50  task-level  PC  measures  (see  Figure  17).  By  weighted  averaging,  these 
were  collapsed  into  a  single  measure.  In  collapsing  the  data,  any  unique  information  provided  in 
individual  task  ratings  was  lost  or  dampened.  Perhaps  a  reduction  in  the  dimensionality  of  the 
response  from  50  measures  per  person  to  one  measure  per  person  was  too  drastic. 


One  alternative  to  averaging  is  to  treat  the  204  x  50  (subjects  x  tasks)  response  matrix 

as  a  multivariate  anaivsis  problem.  A  common  dimensionalitlr-reduction  technique  that  could  be 

.  ! 

applied  is  factor  analysis.  According  to  Dillon  and  Goldstein,  I 


Factor  analysis  attempts  to  simplify  complex  and  diverse  relationships  that  exist  among 
a  set  of  observed  variables  by  uncovering  common  dimensions  or  factors  that  link  to¬ 
gether  seemingly  unrelated  variables,  and  consequently  provides  insight  into  the  under¬ 
lying  structure  of  the  data.  (11.53)  i 


In  other  words,  factor  analysis  could  be  used  to  reduce  the  j>rigina!  set  of  50  response  variables 
to  a  smaller  subset  of  factors  that  account  for  mo.st  of  the  variance  in  the  task-level  data  (11:23). 

In  factor  analysis,  a  factor  represents  an  underlying  qualitative  dimension  like  a  coordinate  axis, 

i 

which  defines  the  way  in  which  different  variables  differ  on  thlt  dimension  (11:60).  Factor  analysis 

i 

results  in  factor  scoring  coefficients  which  can  be  used  to  compute  factor  scores  given  known  levels 
of  the  analyzed  var.ables.  Factor  analysis  basically  takes  advantage  of  the  underlying  correlational 
structure  in  the  variables  under  analysis.  Factors  are  derived  such  that  correlated  variables  tend 
to  load  on  the  same  factors.  Factors,  then,  represent  common  dimensions  that  correlated  variables 
share.  For  a  more  complete  discussion  of  factor  analysis,  refer  to  Dillon  and  Goldstein  (11). 


The  response  matrix  in  the  current  study  could  be  factor-analyzed  to  determine  any  factor 
structure  that  could  be  used  to  reduce  the  number  of  response  variables  to  a  set  of  less  than  50 
factors.  Prediction  models  could  then  theoretically  be  developed  to  predict  computed  factor  scores. 
Factor  analysis  seems  to  be  a  reasonable  midpoint  bet  wwn  collapsing  the  data  into  a  single  measure 


through  averaging,  and  modeling  witli  task-level  data.  The  analyst  or  manpower  modeler  would  of 
course  be  left  with  the  non-triviai  task  of  interpreting  the  factoi.s  •'id  resulting  factor  scores. 

Another  alternative  for  reducing  the  response  matrix  to  less  than  50  variables  would  be  to 
compute  aggregate  measures  at  the  duty  area  level.  Referring  to  Table  4,  there  are  20  duty  areas 
for  the  AGE  specialty,  10  of  which  were  represented  by  tasks  in  the  current  effort.  The  reduction 
from  50  task-level  variables  to  10  or  20  duty  area  variables  would  be  substantial.  Aggregating 
teisks  from  the  same  duty  area,  perhaps  through  weighted  averaging,  would  provide  aggregate 
variables  representing  reasonable  subsets  of  tasks.  These  duty  area  aggregate  variables  could  then 
be  modeled. 

In  summary,  multivariate  analysis  techniques  and  duty  area  aggregation  provide  other  alter¬ 
natives  for  reducing  the  dimensionality  of  the  response  data.  The  attractiveness  of  such  alternatives 
is  that  they  may  not  be  subject  to  the  same  degree  of  information  loss  as  in  the  case  of  averaging 
all  the  task-level  data  for  an  individual  task  into  a  single  measure. 

5.2.4  Developing  Prediction  Models.  Recall  that  the  regression  models  developed  in  this 
thesis  accounted  for  at  most  109?  of  vari..r  :e  in  t’le  response,  PC  (maximum  R-  =  .16).  This 
means  that  at  least  84%  of  the  variance  in  the  response  remains  unexplained  by  the  developed 
models.  To  put  this  in  context,  consider  Figure  27.  Figure  27  indicates  that  there  is  a  relatively 
enormous  portion  of  variance  in  the  response  which  remains  to  be  explained.  Recall  that  these 
results  were  comparable  for  previous  PC  studies  (5)  (13)  (38).  This  means  that  there  is  probably 
significant  improvement  to  be  made  in  all  pha.ses  of  the  job  performance  model  development  process. 

A  likely  place  to  start  improving  the  development  of  such  models  is  in  the  job  performance 
measurement  realm.  But,  as  has  been  proven  over  time,  it  is  extremely  difficult  to  develop  a  sound 
yet  cost  efficient  system  for  collecting  valid  and  reliable  job  performance  data.  This  problem  has 
been  so  pervasive  in  Industrial/Organizational  Psychology  that  it  ha.s  earned  the  fear-instilling 
name  llii  criterion  prohleni.  Volumes  have  been  written  on  job  performance  measurement  and  the 
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Figure  27.  Pie  Chart  Representing  the  Explained  vs.  Unexplained  Variance  in  Productive  Ca¬ 
pacity  Given  the  Current  Models 

criterion  problem.  The  topic  of  job  performance  measurement  cannot  be  given  the  attention  it  is 
due  in  this  limited  space,  thus  the  reader  is  referred  to  Reference  (6)  for  an  introduction  to  the 
topic. 

Another  likely  area  to  be  considered  when  seeking  to  improve  job  performance  models  is 
the  predictor  arena.  Only  two  potential  predictors  were  considered  in  the  current  effort,  aptitude 
and  experience.  As  with  job  performance  measurement,  volumes  have  been  written  concerning  the 
relationship  between  numerous  predictor  variables  and  job  performance.  But,  remember  that  PC 
is  a  fairly  unique  job  performance  measure  in  that  it  is  supposed  to  meeisure  a  worker’s  capacity 
to  produce,  not  how  much  he  or  she  actually  produces.  This  implies  that  many  of  the  personality 
traits  which  would  be  expected  to  influence  productivity  would  not  be  expected  to  influence  PC. 
Such  nwasiires  include  worker  motivation,  job  interest,  work  environment,  and  job  satisfaction. 

There  still  remain  numerous  potential  predictors  which  would  be  expected  to  influence  PC. 
These  include  the  type  and  amount  of  technical  school  training,  the  type  and  amount  of  on- 
the-job  training  (OJT).  the  availability  and  quality  of  written  technical  guidance,  the  amount  of 

119 


technical  interaction  with  highly-skilled  individuals,  trouble-shooting  and  diagnostic  ability,  and 
general  mental  ability,  just  to  name  a  few.  Many  such  predictors  could  be  considered  for  inclusion 
into  Air  Force  job  performance  models.  They  may  perhaps  help  to  explain  additional  variance  in 
the  response. 

A  final  area  for  model  improvement  might  be  the  type  of  model  itself.  Perhaps  linear 
regression-based  models  are  simply  insufficient  for  modeling  the  job  performance  of  human  be¬ 
ings.  Humans  are  obviously  highly  complex  entities  with  each  being  motivated  and  affected  by 
countless  factors.  Added  tc  this,  the  countless  factors  each  influence  different  people  in  different 
ways.  For  these  reasons  alone,  linear  regression  models  may  never  be  able  to  explain  the  majority 
of  the  variance  in  job  performance. 

In  summary,  there  is  significant  improvement  to  be  made  in  job  performance  modeling.  Pos¬ 
sible  improvements  could  be  made  by  improving  the  validity  and  reliability  of  the  response  (job 
performance  measures),  by  considering  other  potential  predictors  and  by  considering  different  types 
of  mathematical  or  maybe  even  non-mathematical  models. 
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Appendix  A.  Tasks  Studied  Under  the  Productive  Capacity  Project 


Table  22:  454X1  Tasks  Studied  Under  the  Productive  Capacity 
Project 


Task 

Description 

E120 

Make  entries  on  supply  issue  and  turn-in  forms. 

E143 

Make  entries  on  AFTO  Form  350  (Reparable  Item  Processing  Tag). 

F153 

Perform  aircraft  support  air-conditioner  visual  and 
service  inspection. 

F154 

Perform  an  aircraft  support  generator  service  inspection. 

F155 

Perform  a  service  inspection  on  a  load  bank. 

F157 

Perform  bomblift  visual  and  service  inspection. 

F162 

Perform  a  service  inspection  on  a  hydraulic  test  stand. 

G171 

Perform  aircraft  support  air  compressor  periodic  inspection. 

G179 

Perform  combustor  cap  portion  of  a  gas  turbine  compressor 
periodic  inspection. 

G181 

Perform  hydraulic  test  stand  periodic  inspection. 

H202 

Fabricate  wiring. 

H203 

Isolate  malfunction  within  electrical  circuitry  other  than 
integrated  or  solid  state. 

H209 

Measure  resistance  in  AGE  electrical  systems  by  checking 
various  circuits  in  the  ignition  system  of  the  MC-2A. 

H215 

Perform  AGE  electrical  systems  operational  checks. 

H236 

Research  T.O.s,  charts,  or  diagrams  for  electrical  maintenance 
instructions. 

H237 

Solder  electrical  system  wiring. 

H238 

Cut  an  electrical  system  wire  in  half  and  splice  it  together 
into  a  circle,  using  one  crimp-type  splice  and  one  soldered  heat 
shrink  splice. 

1247 

Adjust  distributor  points. 

1248 

Adjust  reciprocating  engine  fuel  system  components. 

1251 

Adjust  turbine  engine  fuel  system  components. 

1255 

Change  the  generator  in  an  NF-2. 

1260 

Clean  commutator  and  slip  rings  on  the  generator  of  the  NF-2. 

1264 

Troubleshoot  the  NF-2  generator  for  the  following  symptoms  of 
malfunctions;  (1)  the  engine  will  not  start  when  cranked,  and 
(2)  the  engine  starts  but  backfires  at  the  carburetor. 

1275 

Remove  or  install  a  carburetor  on  an  MC-2A  gasoline  engine. 

1283 

Remove  and  install  engine  exhaust  manifold,  seals,  gaskets, 
and  common  hardware. 

1284 

Remove  and  replace  an  alternator  bell. 

1286 

Remove  and  install  engine  fuel  pumps  on  the  NF-2. 

1299 

Remove  and  install  engine. 

1300 

Replace  the  flare  fitting  on  a  fuel  line. 

J332 

Isolate  the  possible  heater  system  malfunctions  as.sociated 
with  a  discrepancy  that  reads  “burner  will  not  ignite.” 

J340 

Remove  the  burner  control  valve  from  an  AGE  heater. 

J347 

Remove  and  install  heater  engine. 

J355 

Remove  and  install  temperature  selector  valve. 

L406 

Isolate  hydraulic  systems  malfunction. 
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Table  22;  (continued) 


Task 

Description 

L421 

Remove  and  install  hydraulic  lines  on  B-1  stand. 

L436 

Replace  0-rings  in  hydraulic  systems  component. 

L437 

Research  T.O.s,  charts,  or  diagrams  for  AGE  hydraulic  systems 
maintenance. 

M444 

Assemble  bleed  air  hose. 

M446 

Troubleshoot  the  MC-IA  compressor  for  the  discrepancy 
“Compressor  fails  to  unload  at  3600  psi.” 

M447 

Perform  AGE  pneumatic  system  operational  check. 

N475 

Isolate  brake  sy  tern  malfunction. 

N477 

Repack  wheel  bearings  of  one  wheel  on  AGE  equipment  (NF-2). 

N486 

Remove  and  install  AGE  brake  pads. 

N487 

Remove  and  install  AGE  fuel  tank. 

N488 

Change  an  AGE  tire  and  tube  assembly. 

N494 

Remove  and  install  one  six  inch  bolted  hinge. 

N503 

Look  up  the  part  number,  source  code,  and  work  unit  code  to 
requisition  a  new  axle  assembly  for  an  MC-2A  compressor 
(with  date  plate  containing  the  following  information: 

MFG-  Davey  Compressor  Company,  Contract  #-DSA  700-74C-9004, 
Serial  #-16160,  Reg  #-4310-75-018-6160,  Model  #-2MC-2, 

Part  #-27391). 

P549 

Perform  an  operator’s  inspection  of  an  AF  vehicle,  completing 

AFTO  Form  373. 

P554 

Pick  up  and  deliver  -60. 

P555 

Prepare  AGE  (NF-2)  for  shipment  during  a  training  exercise 
or  mobilization. 

The  above  task  descriptions  were  taken  from  Reference  (24) 
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